You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/05/23 00:50:31 UTC

[GitHub] [hive] hmangla98 opened a new pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask

hmangla98 opened a new pull request #2311:
URL: https://github.com/apache/hive/pull/2311


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644531296



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -625,6 +633,11 @@ public boolean runOneWorkerIteration(
     }
     String cmd = null;
     try {
+      TableName tb = req.tableName;

Review comment:
       If the very first table belongs to DbBeingFailover, it will break the logic for "doWait"

##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
##########
@@ -229,6 +231,15 @@ public static boolean isExternalTable(Table table) {
     return isExternal(params);
   }
 
+  public static boolean isDbBeingFailedOver(Database db) {
+    assert (db != null);
+    Map<String, String> dbParameters = db.getParameters();
+    if ((dbParameters != null) && (dbParameters.containsKey(ReplConst.REPL_FAILOVER_ENABLED))) {
+      return ReplConst.TRUE.equals(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));
+    }
+    return false;

Review comment:
       Does this single line suffice?
   return dbParameters != null && ReplConst.TRUE.equals(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));

##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
##########
@@ -229,6 +231,15 @@ public static boolean isExternalTable(Table table) {
     return isExternal(params);
   }
 
+  public static boolean isDbBeingFailedOver(Database db) {
+    assert (db != null);
+    Map<String, String> dbParameters = db.getParameters();
+    if ((dbParameters != null) && (dbParameters.containsKey(ReplConst.REPL_FAILOVER_ENABLED))) {
+      return ReplConst.TRUE.equals(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));
+    }
+    return false;

Review comment:
       also, ReplConst.TRUE.equals : do we need to handle case sensitiveness? 

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -133,11 +135,21 @@ public void run() {
         LOG.info("Looking for tables using catalog: {} dbPattern: {} tablePattern: {} found: {}", catalogName,
           dbPattern, tablePattern, foundTableMetas.size());
 
+        Map<String, Boolean> databasesToSkip = new HashMap<>();
+
         for (TableMeta tableMeta : foundTableMetas) {
           try {
+            String dbName = MetaStoreUtils.prependCatalogToDbName(tableMeta.getCatName(), tableMeta.getDbName(), conf);
+            if (!databasesToSkip.containsKey(dbName)) {
+              Database db = msc.getDatabase(tableMeta.getCatName(), tableMeta.getDbName());
+              databasesToSkip.put(dbName, isTargetOfReplication(db) || MetaStoreUtils.isDbBeingFailedOver(db));
+            }
+            if (databasesToSkip.get(dbName)) {
+              LOG.info("Skipping table : {}", tableMeta.getTableName());

Review comment:
       use debug.

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -133,11 +135,21 @@ public void run() {
         LOG.info("Looking for tables using catalog: {} dbPattern: {} tablePattern: {} found: {}", catalogName,
           dbPattern, tablePattern, foundTableMetas.size());
 
+        Map<String, Boolean> databasesToSkip = new HashMap<>();
+
         for (TableMeta tableMeta : foundTableMetas) {
           try {
+            String dbName = MetaStoreUtils.prependCatalogToDbName(tableMeta.getCatName(), tableMeta.getDbName(), conf);
+            if (!databasesToSkip.containsKey(dbName)) {
+              Database db = msc.getDatabase(tableMeta.getCatName(), tableMeta.getDbName());
+              databasesToSkip.put(dbName, isTargetOfReplication(db) || MetaStoreUtils.isDbBeingFailedOver(db));

Review comment:
       Add a INFO level log for DB that with why it is getting skipped...

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -625,6 +633,11 @@ public boolean runOneWorkerIteration(
     }
     String cmd = null;
     try {
+      TableName tb = req.tableName;

Review comment:
       add a test where db being failed over is picked up first and later the other db is picked up 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -210,27 +213,34 @@ private void stopWorkers() {
     }
   }
 
-  private List<AnalyzeWork> processOneTable(TableName fullTableName)
+  private List<AnalyzeWork> processOneTable(TableName fullTableName, Map<String, Boolean> dbsToSkip)
       throws MetaException, NoSuchTxnException, NoSuchObjectException {
     if (isAnalyzeTableInProgress(fullTableName)) return null;
     String cat = fullTableName.getCat(), db = fullTableName.getDb(), tbl = fullTableName.getTable();
+    String dbName = MetaStoreUtils.prependCatalogToDbName(cat,db, conf);
+    if (!dbsToSkip.containsKey(dbName)) {
+      Database database = rs.getDatabase(cat, db);
+      boolean skipDb = false;
+      if (MetaStoreUtils.isDbBeingFailedOver(database)) {
+        skipDb = true;
+        LOG.info("Skipping all the tables which belong to database: {} as it is being failed over", db);
+      } else if (ReplUtils.isTargetOfReplication(database)) {

Review comment:
       There is lot of code duplication between this and PartitionManagement. Can we make not achieve by having a single copy?
   Also,  why do we have two methods for isTargetOfReplication(), can we have just one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r645057607



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -210,27 +213,34 @@ private void stopWorkers() {
     }
   }
 
-  private List<AnalyzeWork> processOneTable(TableName fullTableName)
+  private List<AnalyzeWork> processOneTable(TableName fullTableName, Map<String, Boolean> dbsToSkip)
       throws MetaException, NoSuchTxnException, NoSuchObjectException {
     if (isAnalyzeTableInProgress(fullTableName)) return null;
     String cat = fullTableName.getCat(), db = fullTableName.getDb(), tbl = fullTableName.getTable();
+    String dbName = MetaStoreUtils.prependCatalogToDbName(cat,db, conf);
+    if (!dbsToSkip.containsKey(dbName)) {
+      Database database = rs.getDatabase(cat, db);
+      boolean skipDb = false;
+      if (MetaStoreUtils.isDbBeingFailedOver(database)) {
+        skipDb = true;
+        LOG.info("Skipping all the tables which belong to database: {} as it is being failed over", db);
+      } else if (ReplUtils.isTargetOfReplication(database)) {

Review comment:
       There is lot of code duplication between this and PartitionManagement. Can we make not achieve by having a single copy?
   Also,  why do we have two methods for isTargetOfReplication(), can we have just one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643398280



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -94,6 +98,8 @@
   private BlockingQueue<AnalyzeWork> workQueue;
   private Thread[] workers;
 
+  private Set<String> dbsBeingFailedOver;

Review comment:
       Why do we need it at instance level?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644540036



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
##########
@@ -229,6 +231,15 @@ public static boolean isExternalTable(Table table) {
     return isExternal(params);
   }
 
+  public static boolean isDbBeingFailedOver(Database db) {
+    assert (db != null);
+    Map<String, String> dbParameters = db.getParameters();
+    if ((dbParameters != null) && (dbParameters.containsKey(ReplConst.REPL_FAILOVER_ENABLED))) {
+      return ReplConst.TRUE.equals(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));
+    }
+    return false;

Review comment:
       Does this single line suffice?
   return dbParameters != null && ReplConst.TRUE.equals(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] hmangla98 commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643587387



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -94,6 +98,8 @@
   private BlockingQueue<AnalyzeWork> workQueue;
   private Thread[] workers;
 
+  private Set<String> dbsBeingFailedOver;

Review comment:
       This is the only way we can access this set within each iteration of StatsUpdater and also within each execution of actual analysis work after dequeuing from the worker queue. This set is clear when new iteration of this thread kicks in.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha merged pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha merged pull request #2311:
URL: https://github.com/apache/hive/pull/2311


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r642797044



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -137,7 +138,8 @@ public void run() {
           try {
             Table table = msc.getTable(tableMeta.getCatName(), tableMeta.getDbName(), tableMeta.getTableName());
             Database db = msc.getDatabase(table.getCatName(), table.getDbName());
-            if (partitionDiscoveryEnabled(table.getParameters()) && !isTargetOfReplication(db)) {
+            if (partitionDiscoveryEnabled(table.getParameters()) && !isTargetOfReplication(db)

Review comment:
       For any of the tables of a db which is a target of replication, we may not need msc.getTable call




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r640356534



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -99,6 +99,15 @@ public boolean isTargetOfReplication(Database db) {
     return false;
   }
 
+  public static boolean isBeingFailovedOver(Database db) {

Review comment:
       We can move this to some util class to avoid duplication

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -162,6 +164,10 @@ public void run() {
         setupMsckPathInvalidation();
         Configuration msckConf = Msck.getMsckConf(conf);
         for (Table table : candidateTables) {
+          if (MetaStoreUtils.isDbBeingFailedOver(msc.getDatabase(table.getCatName(), table.getDbName()))) {

Review comment:
       This is going to be costly. One HMS call per table. Can we maintain a cache?

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -99,6 +99,15 @@ public boolean isTargetOfReplication(Database db) {
     return false;
   }
 
+  public static boolean isBeingFailovedOver(Database db) {

Review comment:
       nit: Typo

##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestPartitionManagement.java
##########
@@ -659,6 +712,45 @@ public void testNoPartitionRetentionForReplTarget() throws TException, Interrupt
     assertEquals(3, partitions.size());
   }
 
+  @Test
+  public void testNoPartitionRetentionForFailoverDb() throws TException, InterruptedException {
+    String dbName = "db_failover";
+    String tableName = "tbl_failover";
+    Map<String, Column> colMap = buildAllColumns();
+    List<String> partKeys = Lists.newArrayList("state", "dt");
+    List<String> partKeyTypes = Lists.newArrayList("string", "date");
+    List<List<String>> partVals = Lists.newArrayList(
+            Lists.newArrayList("__HIVE_DEFAULT_PARTITION__", "1990-01-01"),
+            Lists.newArrayList("CA", "1986-04-28"),
+            Lists.newArrayList("MN", "2018-11-31"));
+    // Check for the existence of partitions 10 seconds after the partition retention period has
+    // elapsed. Gives enough time for the partition retention task to work.
+    long partitionRetentionPeriodMs = 20000;
+    long waitingPeriodForTest = partitionRetentionPeriodMs + 10 * 1000;
+    createMetadata(DEFAULT_CATALOG_NAME, dbName, tableName, partKeys, partKeyTypes, partVals, colMap, false);
+    Table table = client.getTable(dbName, tableName);
+    List<Partition> partitions = client.listPartitions(dbName, tableName, (short) -1);
+    assertEquals(3, partitions.size());
+
+    table.getParameters().put(PartitionManagementTask.DISCOVER_PARTITIONS_TBLPROPERTY, "true");
+    table.getParameters().put(PartitionManagementTask.PARTITION_RETENTION_PERIOD_TBLPROPERTY,
+            partitionRetentionPeriodMs + "ms");
+    client.alter_table(dbName, tableName, table);
+    Database db = client.getDatabase(table.getDbName());
+    db.putToParameters(ReplConst.REPL_FAILOVER_ENABLED, "true");

Review comment:
       May be we. can have two both cases covered, with and without ReplConst.REPL_FAILOVER_ENABLED in the same test.

##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestPartitionManagement.java
##########
@@ -620,6 +620,59 @@ public void testNoPartitionDiscoveryForReplTable() throws Exception {
     assertEquals(3, partitions.size());
   }
 
+  @Test
+  public void testNoPartitionDiscoveryForFailoverDb() throws Exception {
+    String dbName = "db_failover";
+    String tableName = "tbl_failover";
+    Map<String, Column> colMap = buildAllColumns();
+    List<String> partKeys = Lists.newArrayList("state", "dt");
+    List<String> partKeyTypes = Lists.newArrayList("string", "date");
+    List<List<String>> partVals = Lists.newArrayList(
+            Lists.newArrayList("__HIVE_DEFAULT_PARTITION__", "1990-01-01"),
+            Lists.newArrayList("CA", "1986-04-28"),
+            Lists.newArrayList("MN", "2018-11-31"));
+    createMetadata(DEFAULT_CATALOG_NAME, dbName, tableName, partKeys, partKeyTypes, partVals, colMap, false);
+    Table table = client.getTable(dbName, tableName);
+    List<Partition> partitions = client.listPartitions(dbName, tableName, (short) -1);
+    assertEquals(3, partitions.size());
+    String tableLocation = table.getSd().getLocation();
+    URI location = URI.create(tableLocation);
+    Path tablePath = new Path(location);
+    FileSystem fs = FileSystem.get(location, conf);
+    Path newPart1 = new Path(tablePath, "state=WA/dt=2018-12-01");
+    Path newPart2 = new Path(tablePath, "state=UT/dt=2018-12-02");
+    fs.mkdirs(newPart1);
+    fs.mkdirs(newPart2);
+    assertEquals(5, fs.listStatus(tablePath).length);
+    partitions = client.listPartitions(dbName, tableName, (short) -1);
+    assertEquals(3, partitions.size());
+
+    // table property is set to true, but the table is marked as replication target. The new
+    // partitions should not be created
+    table.getParameters().put(PartitionManagementTask.DISCOVER_PARTITIONS_TBLPROPERTY, "true");
+    Database db = client.getDatabase(table.getDbName());
+    db.putToParameters(ReplConst.REPL_FAILOVER_ENABLED, "true");
+    client.alterDatabase(table.getDbName(), db);
+    client.alter_table(dbName, tableName, table);

Review comment:
       Why is this needed?

##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
##########
@@ -228,6 +230,15 @@ public static boolean isExternalTable(Table table) {
     return isExternal(params);
   }
 
+  public static boolean isDbBeingFailedOver(Database db) {
+    assert (db != null);
+    Map<String, String> dbParameters = db.getParameters();
+    if ((dbParameters != null) && (dbParameters.containsKey(ReplConst.REPL_FAILOVER_ENABLED))) {
+      return !StringUtils.isEmpty(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));

Review comment:
       What would the value of repl.failover.enabled be? Is it not 'true' during failover, if so, check for value being true.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r647057749



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/stats/TestStatsUpdaterThread.java
##########
@@ -608,6 +607,63 @@ private void testNoStatsUpdateForReplTable(String tblNamePrefix, String txnPrope
     msClient.close();
   }
 
+  @Test(timeout=80000)
+  public void testNoStatsUpdateForSimpleFailoverDb() throws Exception {
+    testNoStatsUpdateForFailoverDb("simple", "");
+  }
+
+  @Test(timeout=80000)
+  public void testNoStatsUpdateForTxnFailoverDb() throws Exception {
+    testNoStatsUpdateForFailoverDb("txn",
+            "TBLPROPERTIES (\"transactional\"=\"true\",\"transactional_properties\"=\"insert_only\")");
+  }
+
+  private void testNoStatsUpdateForFailoverDb(String tblNamePrefix, String txnProperty) throws Exception {
+    // Set high worker count so we get a longer queue.
+    hiveConf.setInt(MetastoreConf.ConfVars.STATS_AUTO_UPDATE_WORKER_COUNT.getVarname(), 4);
+    String tblWOStats = tblNamePrefix + "_repl_failover_nostats";
+    String ptnTblWOStats = tblNamePrefix + "_ptn_repl_failover_nostats";
+    String dbName = ss.getCurrentDatabase();
+    StatsUpdaterThread su = createUpdater();
+    IMetaStoreClient msClient = new HiveMetaStoreClient(hiveConf);
+    hiveConf.setBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER, false);
+    hiveConf.setBoolVar(HiveConf.ConfVars.HIVESTATSCOLAUTOGATHER, false);
+
+    executeQuery("create table " + tblWOStats + "(i int, s string) " + txnProperty);
+    executeQuery("insert into " + tblWOStats + "(i, s) values (1, 'test')");
+    verifyStatsUpToDate(tblWOStats, Lists.newArrayList("i"), msClient, false);
+
+    executeQuery("create table " + ptnTblWOStats + "(s string) partitioned by (i int) " + txnProperty);
+    executeQuery("insert into " + ptnTblWOStats + "(i, s) values (1, 'test')");
+    executeQuery("insert into " + ptnTblWOStats + "(i, s) values (2, 'test2')");
+    executeQuery("insert into " + ptnTblWOStats + "(i, s) values (3, 'test3')");
+    verifyPartStatsUpToDate(3, 1, msClient, ptnTblWOStats, false);
+
+    assertTrue(su.runOneIteration());
+    Assert.assertEquals(2, su.getQueueLength());
+    executeQuery("alter database " + dbName + " set dbproperties('" + ReplConst.REPL_FAILOVER_ENABLED + "'='true')");
+    //StatsUpdaterThread would not run analyze commands for the tables which were inserted before
+    //failover property was enabled for that database
+    drainWorkQueue(su, 2);
+    verifyStatsUpToDate(tblWOStats, Lists.newArrayList("i"), msClient, false);
+    verifyPartStatsUpToDate(3, 1, msClient, ptnTblWOStats, false);
+    Assert.assertEquals(0, su.getQueueLength());
+
+    executeQuery("create table new_table(s string) partitioned by (i int) " + txnProperty);
+    executeQuery("insert into new_table(i, s) values (4, 'test4')");
+
+    assertFalse(su.runOneIteration());
+    Assert.assertEquals(0, su.getQueueLength());
+    verifyStatsUpToDate(tblWOStats, Lists.newArrayList("i"), msClient, false);
+    verifyPartStatsUpToDate(3, 1, msClient, ptnTblWOStats, false);
+
+    executeQuery("alter database " + dbName + " set dbproperties('" + ReplConst.REPL_FAILOVER_ENABLED + "'='')");
+    executeQuery("drop table " + tblWOStats);

Review comment:
       Actually it would make sense to re-verify after removing this property REPL_FAILOVER_ENABLED that stat updation is happening 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643786935



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -646,6 +668,10 @@ public boolean runOneWorkerIteration(
     return true;
   }
 
+  private synchronized boolean isDbPresentInFailoverSet(String dbName) {
+    return dbsBeingFailedOver.containsKey(dbName) && dbsBeingFailedOver.get(dbName);

Review comment:
       This will not give the desired protection. The atomicity part is still vulnerable




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644531296



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -625,6 +633,11 @@ public boolean runOneWorkerIteration(
     }
     String cmd = null;
     try {
+      TableName tb = req.tableName;

Review comment:
       If the very first table belongs to DbBeingFailover, it will break the logic for "doWait"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644544768



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -625,6 +633,11 @@ public boolean runOneWorkerIteration(
     }
     String cmd = null;
     try {
+      TableName tb = req.tableName;

Review comment:
       add a test where db being failed over is picked up first and later the other db is picked up 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r642873937



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -214,23 +219,28 @@ private void stopWorkers() {
       throws MetaException, NoSuchTxnException, NoSuchObjectException {
     if (isAnalyzeTableInProgress(fullTableName)) return null;
     String cat = fullTableName.getCat(), db = fullTableName.getDb(), tbl = fullTableName.getTable();
+    String dbName = MetaStoreUtils.prependCatalogToDbName(cat,db, conf);
+    if (!isDbTargetOfReplication.containsKey(dbName) || !isDbBeingFailedOver.containsKey(dbName)) {
+      Database database = rs.getDatabase(cat, db);
+      isDbTargetOfReplication.put(dbName, ReplUtils.isTargetOfReplication(database));
+      isDbBeingFailedOver.put(dbName, MetaStoreUtils.isDbBeingFailedOver(database));

Review comment:
       Why do we need two separate maps, we don;t need the reason for skip, just tracking what to skip is fine no?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644543165



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -133,11 +135,21 @@ public void run() {
         LOG.info("Looking for tables using catalog: {} dbPattern: {} tablePattern: {} found: {}", catalogName,
           dbPattern, tablePattern, foundTableMetas.size());
 
+        Map<String, Boolean> databasesToSkip = new HashMap<>();
+
         for (TableMeta tableMeta : foundTableMetas) {
           try {
+            String dbName = MetaStoreUtils.prependCatalogToDbName(tableMeta.getCatName(), tableMeta.getDbName(), conf);
+            if (!databasesToSkip.containsKey(dbName)) {
+              Database db = msc.getDatabase(tableMeta.getCatName(), tableMeta.getDbName());
+              databasesToSkip.put(dbName, isTargetOfReplication(db) || MetaStoreUtils.isDbBeingFailedOver(db));
+            }
+            if (databasesToSkip.get(dbName)) {
+              LOG.info("Skipping table : {}", tableMeta.getTableName());

Review comment:
       use debug.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] hmangla98 commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r645236553



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -210,27 +213,34 @@ private void stopWorkers() {
     }
   }
 
-  private List<AnalyzeWork> processOneTable(TableName fullTableName)
+  private List<AnalyzeWork> processOneTable(TableName fullTableName, Map<String, Boolean> dbsToSkip)
       throws MetaException, NoSuchTxnException, NoSuchObjectException {
     if (isAnalyzeTableInProgress(fullTableName)) return null;
     String cat = fullTableName.getCat(), db = fullTableName.getDb(), tbl = fullTableName.getTable();
+    String dbName = MetaStoreUtils.prependCatalogToDbName(cat,db, conf);
+    if (!dbsToSkip.containsKey(dbName)) {
+      Database database = rs.getDatabase(cat, db);
+      boolean skipDb = false;
+      if (MetaStoreUtils.isDbBeingFailedOver(database)) {
+        skipDb = true;
+        LOG.info("Skipping all the tables which belong to database: {} as it is being failed over", db);
+      } else if (ReplUtils.isTargetOfReplication(database)) {

Review comment:
       We already had two separate methods declared in both ReplUtils and PartitionManagementTask because ReplUtils package is not accessible in MetastoreThreads. But it can be moved to MetastoreUtils where it can be accessible by both of them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644540793



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
##########
@@ -229,6 +231,15 @@ public static boolean isExternalTable(Table table) {
     return isExternal(params);
   }
 
+  public static boolean isDbBeingFailedOver(Database db) {
+    assert (db != null);
+    Map<String, String> dbParameters = db.getParameters();
+    if ((dbParameters != null) && (dbParameters.containsKey(ReplConst.REPL_FAILOVER_ENABLED))) {
+      return ReplConst.TRUE.equals(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));
+    }
+    return false;

Review comment:
       also, ReplConst.TRUE.equals : do we need to handle case sensitiveness? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r642867437



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -84,6 +86,9 @@
   private ConcurrentHashMap<String, Boolean> partsInProgress = new ConcurrentHashMap<>();
   private AtomicInteger itemsInProgress = new AtomicInteger(0);
 
+  Map<String, Boolean> isDbTargetOfReplication = new HashMap<>();
+  Map<String, Boolean> isDbBeingFailedOver = new HashMap<>();

Review comment:
       Why do you need it at instance level?
   When is the map getting cleaned?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643661160



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -625,6 +639,16 @@ public boolean runOneWorkerIteration(
     }
     String cmd = null;
     try {
+      TableName tb = req.tableName;
+      String dbName = MetaStoreUtils.prependCatalogToDbName(tb.getCat(),tb.getDb(), conf);
+      if (dbsBeingFailedOver.contains(dbName)
+              || MetaStoreUtils.isDbBeingFailedOver(rs.getDatabase(tb.getCat(), tb.getDb()))) {
+        if (!dbsBeingFailedOver.contains(dbName)) {

Review comment:
       you can simplify this. We don't need this check all the times




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on pull request #2311:
URL: https://github.com/apache/hive/pull/2311#issuecomment-856505462


   Committed to master.
   
   Thanks for the patch, @hmangla98 !!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha removed a comment on pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha removed a comment on pull request #2311:
URL: https://github.com/apache/hive/pull/2311#issuecomment-856505462


   Committed to master.
   
   Thanks for the patch, @hmangla98 !!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644543669



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -133,11 +135,21 @@ public void run() {
         LOG.info("Looking for tables using catalog: {} dbPattern: {} tablePattern: {} found: {}", catalogName,
           dbPattern, tablePattern, foundTableMetas.size());
 
+        Map<String, Boolean> databasesToSkip = new HashMap<>();
+
         for (TableMeta tableMeta : foundTableMetas) {
           try {
+            String dbName = MetaStoreUtils.prependCatalogToDbName(tableMeta.getCatName(), tableMeta.getDbName(), conf);
+            if (!databasesToSkip.containsKey(dbName)) {
+              Database db = msc.getDatabase(tableMeta.getCatName(), tableMeta.getDbName());
+              databasesToSkip.put(dbName, isTargetOfReplication(db) || MetaStoreUtils.isDbBeingFailedOver(db));

Review comment:
       Add a INFO level log for DB that with why it is getting skipped...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r647056770



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -169,7 +168,7 @@ public void run() {
           // this always runs in 'sync' mode where partitions can be added and dropped
           MsckInfo msckInfo = new MsckInfo(table.getCatName(), table.getDbName(), table.getTableName(),
             null, null, true, true, true, retentionSeconds);
-          executorService.submit(new MsckThread(msckInfo, msckConf, qualifiedTableName, countDownLatch));
+          executorService.submit(new MsckThread(msckInfo, msckConf, qualifiedTableName, countDownLatch, msc));

Review comment:
       Sorry, looks like I missed to notice this change earlier. Sharing HMS client across threads might not be good idea. If all threads makes the call at the same time, the results might get mixed up. It would be better to use separate client for each of them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643399198



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -625,6 +639,16 @@ public boolean runOneWorkerIteration(
     }
     String cmd = null;
     try {
+      TableName tb = req.tableName;
+      String dbName = MetaStoreUtils.prependCatalogToDbName(tb.getCat(),tb.getDb(), conf);
+      if (dbsBeingFailedOver.contains(dbName)
+              || MetaStoreUtils.isDbBeingFailedOver(rs.getDatabase(tb.getCat(), tb.getDb()))) {
+        if (!dbsBeingFailedOver.contains(dbName)) {

Review comment:
       How will this condition be true ?

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -222,17 +237,31 @@ private void setupMsckPathInvalidation() {
     private Configuration conf;
     private String qualifiedTableName;
     private CountDownLatch countDownLatch;
+    private Set<String> dbsBeingFailedOver;
+    private IMetaStoreClient msc;
 
-    MsckThread(MsckInfo msckInfo, Configuration conf, String qualifiedTableName, CountDownLatch countDownLatch) {
+    MsckThread(MsckInfo msckInfo, Configuration conf, String qualifiedTableName,
+               CountDownLatch countDownLatch, Set<String> dbsBeingFailedOver, IMetaStoreClient msc) {
       this.msckInfo = msckInfo;
       this.conf = conf;
       this.qualifiedTableName = qualifiedTableName;
       this.countDownLatch = countDownLatch;
+      this.dbsBeingFailedOver = dbsBeingFailedOver;
+      this.msc = msc;
     }
 
     @Override
     public void run() {
       try {
+        String dbName = MetaStoreUtils.prependCatalogToDbName(msckInfo.getCatalogName(), msckInfo.getDbName(), conf);
+        if (dbsBeingFailedOver.contains(dbName) ||
+                MetaStoreUtils.isDbBeingFailedOver(msc.getDatabase(msckInfo.getCatalogName(), msckInfo.getDbName()))) {
+          if (!dbsBeingFailedOver.contains(dbName)) {

Review comment:
       This isn't thread-safe




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] hmangla98 commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r642797415



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -137,7 +138,8 @@ public void run() {
           try {
             Table table = msc.getTable(tableMeta.getCatName(), tableMeta.getDbName(), tableMeta.getTableName());
             Database db = msc.getDatabase(table.getCatName(), table.getDbName());
-            if (partitionDiscoveryEnabled(table.getParameters()) && !isTargetOfReplication(db)) {
+            if (partitionDiscoveryEnabled(table.getParameters()) && !isTargetOfReplication(db)

Review comment:
       Changes already done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r642767227



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -162,6 +164,10 @@ public void run() {
         setupMsckPathInvalidation();
         Configuration msckConf = Msck.getMsckConf(conf);
         for (Table table : candidateTables) {
+          if (MetaStoreUtils.isDbBeingFailedOver(msc.getDatabase(table.getCatName(), table.getDbName()))) {

Review comment:
       Cache can be short lived and we can rebuild it every time. Then also, we will have it at one call per db at max.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] hmangla98 commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r641668822



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
##########
@@ -228,6 +230,15 @@ public static boolean isExternalTable(Table table) {
     return isExternal(params);
   }
 
+  public static boolean isDbBeingFailedOver(Database db) {
+    assert (db != null);
+    Map<String, String> dbParameters = db.getParameters();
+    if ((dbParameters != null) && (dbParameters.containsKey(ReplConst.REPL_FAILOVER_ENABLED))) {
+      return !StringUtils.isEmpty(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] hmangla98 commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r643647370



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -625,6 +639,16 @@ public boolean runOneWorkerIteration(
     }
     String cmd = null;
     try {
+      TableName tb = req.tableName;
+      String dbName = MetaStoreUtils.prependCatalogToDbName(tb.getCat(),tb.getDb(), conf);
+      if (dbsBeingFailedOver.contains(dbName)
+              || MetaStoreUtils.isDbBeingFailedOver(rs.getDatabase(tb.getCat(), tb.getDb()))) {
+        if (!dbsBeingFailedOver.contains(dbName)) {

Review comment:
       if current dbName is not present in dbsBeingFailover set, then it'll add this db to it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] hmangla98 commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r641672545



##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestPartitionManagement.java
##########
@@ -620,6 +620,59 @@ public void testNoPartitionDiscoveryForReplTable() throws Exception {
     assertEquals(3, partitions.size());
   }
 
+  @Test
+  public void testNoPartitionDiscoveryForFailoverDb() throws Exception {
+    String dbName = "db_failover";
+    String tableName = "tbl_failover";
+    Map<String, Column> colMap = buildAllColumns();
+    List<String> partKeys = Lists.newArrayList("state", "dt");
+    List<String> partKeyTypes = Lists.newArrayList("string", "date");
+    List<List<String>> partVals = Lists.newArrayList(
+            Lists.newArrayList("__HIVE_DEFAULT_PARTITION__", "1990-01-01"),
+            Lists.newArrayList("CA", "1986-04-28"),
+            Lists.newArrayList("MN", "2018-11-31"));
+    createMetadata(DEFAULT_CATALOG_NAME, dbName, tableName, partKeys, partKeyTypes, partVals, colMap, false);
+    Table table = client.getTable(dbName, tableName);
+    List<Partition> partitions = client.listPartitions(dbName, tableName, (short) -1);
+    assertEquals(3, partitions.size());
+    String tableLocation = table.getSd().getLocation();
+    URI location = URI.create(tableLocation);
+    Path tablePath = new Path(location);
+    FileSystem fs = FileSystem.get(location, conf);
+    Path newPart1 = new Path(tablePath, "state=WA/dt=2018-12-01");
+    Path newPart2 = new Path(tablePath, "state=UT/dt=2018-12-02");
+    fs.mkdirs(newPart1);
+    fs.mkdirs(newPart2);
+    assertEquals(5, fs.listStatus(tablePath).length);
+    partitions = client.listPartitions(dbName, tableName, (short) -1);
+    assertEquals(3, partitions.size());
+
+    // table property is set to true, but the table is marked as replication target. The new
+    // partitions should not be created
+    table.getParameters().put(PartitionManagementTask.DISCOVER_PARTITIONS_TBLPROPERTY, "true");
+    Database db = client.getDatabase(table.getDbName());
+    db.putToParameters(ReplConst.REPL_FAILOVER_ENABLED, "true");
+    client.alterDatabase(table.getDbName(), db);
+    client.alter_table(dbName, tableName, table);

Review comment:
       Alter table will enable discover partition property for the table.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] hmangla98 commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r645236553



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##########
@@ -210,27 +213,34 @@ private void stopWorkers() {
     }
   }
 
-  private List<AnalyzeWork> processOneTable(TableName fullTableName)
+  private List<AnalyzeWork> processOneTable(TableName fullTableName, Map<String, Boolean> dbsToSkip)
       throws MetaException, NoSuchTxnException, NoSuchObjectException {
     if (isAnalyzeTableInProgress(fullTableName)) return null;
     String cat = fullTableName.getCat(), db = fullTableName.getDb(), tbl = fullTableName.getTable();
+    String dbName = MetaStoreUtils.prependCatalogToDbName(cat,db, conf);
+    if (!dbsToSkip.containsKey(dbName)) {
+      Database database = rs.getDatabase(cat, db);
+      boolean skipDb = false;
+      if (MetaStoreUtils.isDbBeingFailedOver(database)) {
+        skipDb = true;
+        LOG.info("Skipping all the tables which belong to database: {} as it is being failed over", db);
+      } else if (ReplUtils.isTargetOfReplication(database)) {

Review comment:
       We already had two separate methods declared in both ReplUtils and PartitionManagementTask because ReplUtils package is not accessible in MetastoreThreads. But it can be moved to MetastoreUtils where it can be accessible by both of them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] hmangla98 commented on a change in pull request #2311: HIVE-25154: Disable StatsUpdaterThread and PartitionManagementTask for db which is being failed over.

Posted by GitBox <gi...@apache.org>.
hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r641674237



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##########
@@ -162,6 +164,10 @@ public void run() {
         setupMsckPathInvalidation();
         Configuration msckConf = Msck.getMsckConf(conf);
         for (Table table : candidateTables) {
+          if (MetaStoreUtils.isDbBeingFailedOver(msc.getDatabase(table.getCatName(), table.getDbName()))) {

Review comment:
       Might not be doable. Say somehow we cached db failover property and after this repl.failover.enbaled prop is set to true for that db. Now, how will we figure out that cached data is outdated and needs to be fetched again?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org