You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/02/03 23:20:56 UTC

[GitHub] [hive] pkumarsinha opened a new pull request #1942: HIVE-24733: Handle replication when db location and managed location …

pkumarsinha opened a new pull request #1942:
URL: https://github.com/apache/hive/pull/1942


   …is set to custom location on source
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #1942: HIVE-24733: Handle replication when db location and managed location …

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #1942:
URL: https://github.com/apache/hive/pull/1942#discussion_r576094055



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
##########
@@ -162,6 +184,10 @@ private boolean isDbEmpty(String dbName) throws HiveException {
     Map<String, String> parameters = new HashMap<>(dbObj.getParameters());
     parameters.remove(ReplicationSpec.KEY.CURR_STATE_ID.toString());
 
+    parameters.remove(ReplUtils.REPL_IS_CUSTOM_DB_LOC);

Review comment:
       These parameters are passed from dump to signify  the database has custom locations on source  or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #1942: HIVE-24733: Handle replication when db location and managed location …

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #1942:
URL: https://github.com/apache/hive/pull/1942#discussion_r576959330



##########
File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##########
@@ -485,6 +485,9 @@ private Task getReplLoadRootTask(String sourceDb, String replicadb, boolean isIn
       metricCollector = new BootstrapLoadMetricCollector(replicadb, tuple.dumpLocation, 0,
         confTemp);
     }
+    /* When 'hive.repl.retain.custom.db.locations.on.target' is enabled, the first iteration of repl load would
+       run only database creation task, and only in next iteration of Repl Load Task execution, remaining tasks will be
+       executed. Hence disabling this to perform the test on task optimization.  */

Review comment:
       For  this test case, it was just the task what it validates and hence have not added here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #1942: HIVE-24733: Handle replication when db location and managed location …

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #1942:
URL: https://github.com/apache/hive/pull/1942#discussion_r576086486



##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -554,6 +554,8 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     REPL_RETAIN_PREV_DUMP_DIR_COUNT("hive.repl.retain.prev.dump.dir.count", 3,
             "Indicates maximium number of latest previously used dump-directories which would be retained when " +
              "hive.repl.retain.prev.dump.dir is set to true"),
+    REPL_RETAIN_CUSTOM_LOCATIONS_FOR_DB_ON_TARGET("hive.repl.retain.custom.db.locations.on.target", true,
+            "Indicates if source database has custom warehouse locations, whether that should be retained on target as well"),

Review comment:
       Currently, we have  single config to retain or not for both external and managed locations. Do you think, we should separate the configs, just that it will increase the supported combinations. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha merged pull request #1942: HIVE-24733: Handle replication when db location and managed location …

Posted by GitBox <gi...@apache.org>.
pkumarsinha merged pull request #1942:
URL: https://github.com/apache/hive/pull/1942


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #1942: HIVE-24733: Handle replication when db location and managed location …

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #1942:
URL: https://github.com/apache/hive/pull/1942#discussion_r575957081



##########
File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##########
@@ -485,6 +485,7 @@ private Task getReplLoadRootTask(String sourceDb, String replicadb, boolean isIn
       metricCollector = new BootstrapLoadMetricCollector(replicadb, tuple.dumpLocation, 0,
         confTemp);
     }
+    confTemp.setBoolVar(HiveConf.ConfVars.REPL_RETAIN_CUSTOM_LOCATIONS_FOR_DB_ON_TARGET, false);

Review comment:
       Add a comment for setting this to false.

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -554,6 +554,8 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     REPL_RETAIN_PREV_DUMP_DIR_COUNT("hive.repl.retain.prev.dump.dir.count", 3,
             "Indicates maximium number of latest previously used dump-directories which would be retained when " +
              "hive.repl.retain.prev.dump.dir is set to true"),
+    REPL_RETAIN_CUSTOM_LOCATIONS_FOR_DB_ON_TARGET("hive.repl.retain.custom.db.locations.on.target", true,
+            "Indicates if source database has custom warehouse locations, whether that should be retained on target as well"),

Review comment:
       managed table location?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##########
@@ -244,6 +245,7 @@ a database ( directory )
           scope.database = true;
         }
         dbTracker.debugLog("database");
+        dbEventFound = true;

Review comment:
       should this be set to true only if scope.database = true. What is tracked using that?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
##########
@@ -162,6 +184,10 @@ private boolean isDbEmpty(String dbName) throws HiveException {
     Map<String, String> parameters = new HashMap<>(dbObj.getParameters());
     parameters.remove(ReplicationSpec.KEY.CURR_STATE_ID.toString());
 
+    parameters.remove(ReplUtils.REPL_IS_CUSTOM_DB_LOC);

Review comment:
       why are these params needed?

##########
File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExclusiveReplica.java
##########
@@ -522,6 +524,194 @@ public void externalTableReplicationDropDatabase() throws Throwable {
     verifyTableDataExists(replica, dbDataLocReplica, tableName, true);
   }
 
+  @Test
+  public void testCustomWarehouseLocations() throws Throwable {
+    List<String> withClauseOptions = getStagingLocationConfig(primary.repldDir, false);
+    String dbWhManagedLoc = new Path(primary.warehouseRoot.getParent(), "customManagedLoc").toUri().getPath();
+    String dbWhExternalLoc = new Path(primary.externalTableWarehouseRoot.getParent(),
+            "customExternalLoc").toUri().getPath();
+    String srcDb = "srcDb";
+    WarehouseInstance.Tuple tuple = primary
+            .run("create database " + srcDb + " LOCATION '" + dbWhExternalLoc + "' MANAGEDLOCATION '" + dbWhManagedLoc
+                    + "' WITH DBPROPERTIES ( '" + SOURCE_OF_REPLICATION + "' = '1,2,3')")
+            .run("use " + srcDb)
+            .run("create table t1 (id int)")
+            .run("insert into table t1 values (500)")
+            .run("create external table t2 (id int)")
+            .run("insert into table t2 values (1000)")
+            .run("create table tp1 (id int) partitioned by (p int)")
+            .run("insert into tp1 partition(p=1) values(10)")
+            .run("insert into tp1 partition(p=2) values(20)")
+            .dump(srcDb, withClauseOptions);
+
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResult("500")
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResult("1000")
+            .run("show tables like 'tp1'")
+            .verifyResult("tp1")
+            .run("select id from tp1")
+            .verifyResults(new String[]{"10", "20"});
+    verifyCustomDBLocations(dbWhManagedLoc, dbWhExternalLoc, true);
+    primary.run("use " + srcDb)
+            .run("insert into table t1 values (1000)")
+            .run("insert into table t2 values (2000)")
+            .run("insert into tp1 partition(p=1) values(30)")
+            .run("insert into tp1 partition(p=2) values(40)")
+            .dump(srcDb, withClauseOptions);
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResults(new String[]{"500", "1000"})
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResults(new String[]{"1000", "2000"})
+            .run("show tables like 'tp1'")
+            .verifyResult("tp1")
+            .run("select id from tp1")
+            .verifyResults(new String[]{"10", "20", "30", "40"});
+    primary.run("use " + srcDb)
+            .run("insert into table t1 values (2000)")
+            .run("insert into table t2 values (3000)")
+            .run("create table t3 (id int)")
+            .run("insert into table t3 values (3000)")
+            .run("create external table t4 (id int)")
+            .run("insert into table t4 values (4000)")
+            .run("insert into tp1 partition(p=1) values(50)")
+            .run("insert into tp1 partition(p=2) values(60)")
+            .run("create table tp2 (id int) partitioned by (p int)")
+            .run("insert into tp2 partition(p=1) values(100)")
+            .run("insert into tp2 partition(p=2) values(200)")
+            .dump(srcDb, withClauseOptions);
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResults(new String[]{"500", "1000", "2000"})
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResults(new String[]{"1000", "2000", "3000"})
+            .run("show tables like 't3'")
+            .verifyResult("t3")
+            .run("select id from t3")
+            .verifyResults(new String[]{"3000"})
+            .run("show tables like 't4'")
+            .verifyResult("t4")
+            .run("select id from t4")
+            .verifyResults(new String[]{"4000"})
+            .run("select id from tp1")
+            .verifyResults(new String[]{"10", "20", "30", "40", "50", "60"})
+            .run("show tables like 'tp1'")
+            .verifyResult("tp1")
+            .run("select id from tp2")
+            .verifyResults(new String[]{"100", "200"});
+  }
+
+  @Test
+  public void testCustomWarehouseLocationsConf() throws Throwable {
+    List<String> withClauseOptions = getStagingLocationConfig(primary.repldDir, false);
+    String dbWhManagedLoc = new Path(primary.warehouseRoot.getParent(), "customManagedLoc1").toUri().getPath();
+    String dbWhExternalLoc = new Path(primary.externalTableWarehouseRoot.getParent(),
+            "customExternalLoc1").toUri().getPath();
+    String srcDb = "srcDbConf";
+    WarehouseInstance.Tuple tuple = primary
+            .run("create database " + srcDb + " LOCATION '" + dbWhExternalLoc + "' MANAGEDLOCATION '" + dbWhManagedLoc
+                    + "' WITH DBPROPERTIES ( '" + SOURCE_OF_REPLICATION + "' = '1,2,3')")
+            .run("use " + srcDb)
+            .run("create table t1 (id int)")
+            .run("insert into table t1 values (500)")
+            .run("create external table t2 (id int)")
+            .run("insert into table t2 values (1000)")
+            .dump(srcDb, withClauseOptions);
+
+    withClauseOptions.add("'" + HiveConf.ConfVars.REPL_RETAIN_CUSTOM_LOCATIONS_FOR_DB_ON_TARGET.varname + "'='false'");
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResult("500")
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResult("1000");
+    verifyDefaultDBLocations(dbWhManagedLoc, dbWhExternalLoc);
+    primary.run("use " + srcDb)
+            .run("insert into table t1 values (1000)")
+            .run("insert into table t2 values (2000)")
+            .dump(srcDb, withClauseOptions);
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResults(new String[]{"500", "1000"})
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResults(new String[]{"1000", "2000"});
+    primary.run("use " + srcDb)
+            .run("insert into table t1 values (2000)")
+            .run("insert into table t2 values (3000)")
+            .run("create table t3 (id int)")
+            .run("insert into table t3 values (3000)")
+            .run("create external table t4 (id int)")
+            .run("insert into table t4 values (4000)")
+            .dump(srcDb, withClauseOptions);
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResults(new String[]{"500", "1000", "2000"})
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResults(new String[]{"1000", "2000", "3000"})
+            .run("show tables like 't3'")
+            .verifyResult("t3")
+            .run("select id from t3")
+            .verifyResults(new String[]{"3000"})
+            .run("show tables like 't4'")
+            .verifyResult("t4")
+            .run("select id from t4")
+            .verifyResults(new String[]{"4000"});
+  }
+
+  private void verifyCustomDBLocations(String managedCustLocOnSrc, String externalCustLocOnSrc, boolean replaceCustPath)
+          throws Exception {
+    if (replaceCustPath ) {
+      Database replDatabase  = replica.getDatabase(replicatedDbName);
+      String managedCustLocOnTgt = new Path(replDatabase.getManagedLocationUri()).toUri().getPath();
+      Assert.assertEquals(managedCustLocOnSrc,  managedCustLocOnTgt);
+      Assert.assertNotEquals(managedCustLocOnTgt,  replica.warehouseRoot.toUri().getPath());
+      String externalCustLocOnTgt = new Path(replDatabase.getLocationUri()).toUri().getPath();
+      Assert.assertEquals(externalCustLocOnSrc,  externalCustLocOnTgt);
+      Assert.assertNotEquals(externalCustLocOnTgt,  new Path(replica.externalTableWarehouseRoot,
+              replicatedDbName.toLowerCase()  + ".db").toUri().getPath());
+    }
+  }
+
+  private void verifyDefaultDBLocations(String managedCustLocOnSrc, String externalCustLocOnSrc) throws Exception {

Review comment:
       can reuse the same method as verifyCustomDBLocations with replaceCustPath set to false

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSDatabaseEvent.java
##########
@@ -24,6 +24,7 @@
 import org.apache.hadoop.hive.metastore.api.Database;
 import org.apache.hadoop.hive.ql.exec.repl.bootstrap.events.DatabaseEvent;
 import org.apache.hadoop.hive.ql.parse.EximUtil;
+import org.apache.hadoop.hive.ql.parse.ReplicationSpec;

Review comment:
       unused import

##########
File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExclusiveReplica.java
##########
@@ -522,6 +524,194 @@ public void externalTableReplicationDropDatabase() throws Throwable {
     verifyTableDataExists(replica, dbDataLocReplica, tableName, true);
   }
 
+  @Test
+  public void testCustomWarehouseLocations() throws Throwable {
+    List<String> withClauseOptions = getStagingLocationConfig(primary.repldDir, false);
+    String dbWhManagedLoc = new Path(primary.warehouseRoot.getParent(), "customManagedLoc").toUri().getPath();
+    String dbWhExternalLoc = new Path(primary.externalTableWarehouseRoot.getParent(),
+            "customExternalLoc").toUri().getPath();
+    String srcDb = "srcDb";
+    WarehouseInstance.Tuple tuple = primary
+            .run("create database " + srcDb + " LOCATION '" + dbWhExternalLoc + "' MANAGEDLOCATION '" + dbWhManagedLoc
+                    + "' WITH DBPROPERTIES ( '" + SOURCE_OF_REPLICATION + "' = '1,2,3')")
+            .run("use " + srcDb)
+            .run("create table t1 (id int)")
+            .run("insert into table t1 values (500)")
+            .run("create external table t2 (id int)")
+            .run("insert into table t2 values (1000)")
+            .run("create table tp1 (id int) partitioned by (p int)")
+            .run("insert into tp1 partition(p=1) values(10)")
+            .run("insert into tp1 partition(p=2) values(20)")
+            .dump(srcDb, withClauseOptions);
+
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResult("500")
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResult("1000")
+            .run("show tables like 'tp1'")
+            .verifyResult("tp1")
+            .run("select id from tp1")
+            .verifyResults(new String[]{"10", "20"});
+    verifyCustomDBLocations(dbWhManagedLoc, dbWhExternalLoc, true);
+    primary.run("use " + srcDb)
+            .run("insert into table t1 values (1000)")
+            .run("insert into table t2 values (2000)")
+            .run("insert into tp1 partition(p=1) values(30)")
+            .run("insert into tp1 partition(p=2) values(40)")
+            .dump(srcDb, withClauseOptions);
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResults(new String[]{"500", "1000"})
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResults(new String[]{"1000", "2000"})
+            .run("show tables like 'tp1'")
+            .verifyResult("tp1")
+            .run("select id from tp1")
+            .verifyResults(new String[]{"10", "20", "30", "40"});
+    primary.run("use " + srcDb)
+            .run("insert into table t1 values (2000)")
+            .run("insert into table t2 values (3000)")
+            .run("create table t3 (id int)")
+            .run("insert into table t3 values (3000)")
+            .run("create external table t4 (id int)")
+            .run("insert into table t4 values (4000)")
+            .run("insert into tp1 partition(p=1) values(50)")
+            .run("insert into tp1 partition(p=2) values(60)")
+            .run("create table tp2 (id int) partitioned by (p int)")
+            .run("insert into tp2 partition(p=1) values(100)")
+            .run("insert into tp2 partition(p=2) values(200)")
+            .dump(srcDb, withClauseOptions);
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResults(new String[]{"500", "1000", "2000"})
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResults(new String[]{"1000", "2000", "3000"})
+            .run("show tables like 't3'")
+            .verifyResult("t3")
+            .run("select id from t3")
+            .verifyResults(new String[]{"3000"})
+            .run("show tables like 't4'")
+            .verifyResult("t4")
+            .run("select id from t4")
+            .verifyResults(new String[]{"4000"})
+            .run("select id from tp1")
+            .verifyResults(new String[]{"10", "20", "30", "40", "50", "60"})
+            .run("show tables like 'tp1'")
+            .verifyResult("tp1")
+            .run("select id from tp2")
+            .verifyResults(new String[]{"100", "200"});
+  }
+
+  @Test
+  public void testCustomWarehouseLocationsConf() throws Throwable {
+    List<String> withClauseOptions = getStagingLocationConfig(primary.repldDir, false);
+    String dbWhManagedLoc = new Path(primary.warehouseRoot.getParent(), "customManagedLoc1").toUri().getPath();
+    String dbWhExternalLoc = new Path(primary.externalTableWarehouseRoot.getParent(),
+            "customExternalLoc1").toUri().getPath();
+    String srcDb = "srcDbConf";
+    WarehouseInstance.Tuple tuple = primary
+            .run("create database " + srcDb + " LOCATION '" + dbWhExternalLoc + "' MANAGEDLOCATION '" + dbWhManagedLoc
+                    + "' WITH DBPROPERTIES ( '" + SOURCE_OF_REPLICATION + "' = '1,2,3')")
+            .run("use " + srcDb)
+            .run("create table t1 (id int)")
+            .run("insert into table t1 values (500)")
+            .run("create external table t2 (id int)")
+            .run("insert into table t2 values (1000)")
+            .dump(srcDb, withClauseOptions);
+
+    withClauseOptions.add("'" + HiveConf.ConfVars.REPL_RETAIN_CUSTOM_LOCATIONS_FOR_DB_ON_TARGET.varname + "'='false'");
+    replica.load(replicatedDbName, srcDb, withClauseOptions)
+            .run("use " + replicatedDbName)
+            .run("show tables like 't1'")
+            .verifyResult("t1")
+            .run("select id from t1")
+            .verifyResult("500")
+            .run("show tables like 't2'")
+            .verifyResult("t2")
+            .run("select id from t2")
+            .verifyResult("1000");
+    verifyDefaultDBLocations(dbWhManagedLoc, dbWhExternalLoc);

Review comment:
       compare table locations also




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #1942: HIVE-24733: Handle replication when db location and managed location …

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #1942:
URL: https://github.com/apache/hive/pull/1942#discussion_r577765531



##########
File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##########
@@ -485,6 +485,9 @@ private Task getReplLoadRootTask(String sourceDb, String replicadb, boolean isIn
       metricCollector = new BootstrapLoadMetricCollector(replicadb, tuple.dumpLocation, 0,
         confTemp);
     }
+    /* When 'hive.repl.retain.custom.db.locations.on.target' is enabled, the first iteration of repl load would
+       run only database creation task, and only in next iteration of Repl Load Task execution, remaining tasks will be
+       executed. Hence disabling this to perform the test on task optimization.  */

Review comment:
       Why is this set to false in BaseReplicationScenarios




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #1942: HIVE-24733: Handle replication when db location and managed location …

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #1942:
URL: https://github.com/apache/hive/pull/1942#discussion_r576563534



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##########
@@ -245,7 +245,9 @@ a database ( directory )
           scope.database = true;
         }
         dbTracker.debugLog("database");
-        dbEventFound = true;
+        if (scope.database) {

Review comment:
       Do you need a separate check? Can be done at the same place where scope.database is set to true

##########
File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##########
@@ -485,6 +485,9 @@ private Task getReplLoadRootTask(String sourceDb, String replicadb, boolean isIn
       metricCollector = new BootstrapLoadMetricCollector(replicadb, tuple.dumpLocation, 0,
         confTemp);
     }
+    /* When 'hive.repl.retain.custom.db.locations.on.target' is enabled, the first iteration of repl load would
+       run only database creation task, and only in next iteration of Repl Load Task execution, remaining tasks will be
+       executed. Hence disabling this to perform the test on task optimization.  */

Review comment:
       its also because the locations would overwrite and things like that right?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
##########
@@ -96,6 +98,26 @@ Database readDbMetadata() throws SemanticException {
     return event.dbInMetadata(dbNameToLoadIn);
   }
 
+  String getDbLocation(Database dbInMetadata) {
+    if (context.hiveConf.getBoolVar(HiveConf.ConfVars.REPL_RETAIN_CUSTOM_LOCATIONS_FOR_DB_ON_TARGET)
+            && Boolean.parseBoolean(dbInMetadata.getParameters().get(ReplUtils.REPL_IS_CUSTOM_DB_LOC))) {
+      String locOnTarget = new Path(dbInMetadata.getLocationUri()).toUri().getPath().toString();
+      LOG.info("Using the custom location {} on the target", locOnTarget);
+      return locOnTarget;
+    }
+    return null;
+  }
+
+  String getDbManagedLocation(Database dbInMetadata) {
+    if (context.hiveConf.getBoolVar(HiveConf.ConfVars.REPL_RETAIN_CUSTOM_LOCATIONS_FOR_DB_ON_TARGET)
+            && Boolean.parseBoolean(dbInMetadata.getParameters().get(ReplUtils.REPL_IS_CUSTOM_DB_MANAGEDLOC))) {

Review comment:
       this might cause backward compatibility issue?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #1942: HIVE-24733: Handle replication when db location and managed location …

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #1942:
URL: https://github.com/apache/hive/pull/1942#discussion_r576964636



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
##########
@@ -96,6 +98,26 @@ Database readDbMetadata() throws SemanticException {
     return event.dbInMetadata(dbNameToLoadIn);
   }
 
+  String getDbLocation(Database dbInMetadata) {
+    if (context.hiveConf.getBoolVar(HiveConf.ConfVars.REPL_RETAIN_CUSTOM_LOCATIONS_FOR_DB_ON_TARGET)
+            && Boolean.parseBoolean(dbInMetadata.getParameters().get(ReplUtils.REPL_IS_CUSTOM_DB_LOC))) {
+      String locOnTarget = new Path(dbInMetadata.getLocationUri()).toUri().getPath().toString();
+      LOG.info("Using the custom location {} on the target", locOnTarget);
+      return locOnTarget;
+    }
+    return null;
+  }
+
+  String getDbManagedLocation(Database dbInMetadata) {
+    if (context.hiveConf.getBoolVar(HiveConf.ConfVars.REPL_RETAIN_CUSTOM_LOCATIONS_FOR_DB_ON_TARGET)
+            && Boolean.parseBoolean(dbInMetadata.getParameters().get(ReplUtils.REPL_IS_CUSTOM_DB_MANAGEDLOC))) {

Review comment:
       No, it will fall back to older behavior.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org