You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/03/12 17:41:21 UTC

[GitHub] [hive] klcopp opened a new pull request #2068: HIVE-24881: Abort old open replication txns

klcopp opened a new pull request #2068:
URL: https://github.com/apache/hive/pull/2068


   We should auto-abort/remove open replication txns that are older than a time threshold (default: 24h).
   
   ### How was this patch tested?
   Unit test
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598755969



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5086,8 +5088,13 @@ public void performTimeOuts() {
       while(true) {
         stmt = dbConn.createStatement();
         String s = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +

Review comment:
       It's more concise now, and I think pretty readable




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r599584896



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5086,8 +5088,13 @@ public void performTimeOuts() {
       while(true) {
         stmt = dbConn.createStatement();
         String s = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +

Review comment:
       Yes but this needs to go in ASAP so I will skip this extra cleanup task. New changes are not making it worse than it is




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r594942599



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java
##########
@@ -459,11 +476,11 @@ public void testHeartbeater() throws Exception {
     // Case 3: If there's delay for the heartbeat, and the delay is long enough to trigger the reaper,
     //         then the txn will time out and be aborted.
     //         Here we just don't send the heartbeat at all - an infinite delay.
-    // Start the heartbeat after a delay, which exceeds the HIVE_TXN_TIMEOUT
-    ((DbTxnManager) txnMgr).openTxn(ctx, "jerry", TxnType.DEFAULT,
-        HiveConf.getTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS) * 2);
+    // Start the heartbeat after a delay, which exceeds the timeout threshold (e.g. TXN_TIMEOUT)
+    ((DbTxnManager) txnMgr).openTxn(ctx, "jerry", txnType,
+        MetastoreConf.getTimeVar(conf, timeThresholdConfVar, TimeUnit.MILLISECONDS) * 2);
     txnMgr.acquireLocks(qp, ctx, "jerry");
-    Thread.sleep(HiveConf.getTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS));
+    Thread.sleep(MetastoreConf.getTimeVar(conf, timeThresholdConfVar, TimeUnit.MILLISECONDS));

Review comment:
       Is this using the default value of 24 hours? that is going to be a slow test :) The other timeout config is overridden in the setup to 10 sec




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
aasha commented on pull request #2068:
URL: https://github.com/apache/hive/pull/2068#issuecomment-800077351


   Events are replicated sequencially. There may be cases where in one replication schedule, the open transaction is replicated but the commit will be replicated in the next schedule. So aborting repl transactions in a given timeframe will break replication.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r599287490



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -377,6 +377,8 @@ public void setConf(Configuration conf){
     numOpenTxns = Metrics.getOrCreateGauge(MetricsConstants.NUM_OPEN_TXNS);
 
     timeout = MetastoreConf.getTimeVar(conf, ConfVars.TXN_TIMEOUT, TimeUnit.MILLISECONDS);
+    replicationTxnTimeout = MetastoreConf.getTimeVar(conf, ConfVars.REPL_EVENT_DB_LISTENER_TTL, TimeUnit.MILLISECONDS) +

Review comment:
       REPL_EVENT_DB_LISTENER_TTL value should be picked from source cluster. If the value is different in source and target cluster, we can't automatically determine this. Instead we should update the timeout config description and provide this as a guideline.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r597397758



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##########
@@ -1031,6 +1031,9 @@ public static ConfVars getMetaConf(String name) {
     REPL_METRICS_MAX_AGE("metastore.repl.metrics.max.age",
       "hive.metastore.repl.metrics.max.age", 7, TimeUnit.DAYS,
       "Maximal age of a replication metrics entry before it is removed."),
+    REPL_TXN_TIMEOUT("metastore.repl.txn.timeout", "hive.repl.txn.timeout", 1, TimeUnit.DAYS,

Review comment:
       The default value of this should be a very high value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r599599368



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5086,8 +5088,13 @@ public void performTimeOuts() {
       while(true) {
         stmt = dbConn.createStatement();
         String s = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +

Review comment:
       ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598624457



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5067,6 +5069,25 @@ private void timeOutLocks(Connection dbConn) {
    */
   @RetrySemantics.Idempotent
   public void performTimeOuts() {
+    try {
+      // time out non-replication txns
+      String epochFn = getEpochFn(dbProduct);
+      String timeoutTxnsQuery = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +
+              " AND \"TXN_LAST_HEARTBEAT\" <  " + epochFn + "-" + timeout +
+              " AND \"TXN_TYPE\" != " + TxnType.REPL_CREATED.getValue();
+      performTimeOutsInternal(timeoutTxnsQuery);

Review comment:
       Why do we need 2 db connections here to perform the timeout? Could we change this to a single query to avoid doing FULL table scan several times?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598535703



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java
##########
@@ -486,6 +505,8 @@ public void setUp() throws Exception {
     conf.setTimeVar(HiveConf.ConfVars.HIVE_TIMEDOUT_TXN_REAPER_START, 0, TimeUnit.SECONDS);
     conf.setTimeVar(HiveConf.ConfVars.HIVE_TXN_TIMEOUT, 10, TimeUnit.SECONDS);
     houseKeeperService = new AcidHouseKeeperService();
+    MetastoreConf.setTimeVar(conf, MetastoreConf.ConfVars.REPL_TXN_TIMEOUT, 1, TimeUnit.MINUTES);

Review comment:
       It can be less




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598534948



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5086,8 +5093,10 @@ public void performTimeOuts() {
       while(true) {
         stmt = dbConn.createStatement();
         String s = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +
-            " AND \"TXN_LAST_HEARTBEAT\" <  " + getEpochFn(dbProduct) + "-" + timeout +
-            " AND \"TXN_TYPE\" != " + TxnType.REPL_CREATED.getValue();
+            " AND \"TXN_LAST_HEARTBEAT\" <  " + getEpochFn(dbProduct) + "-" +
+            (replication ? replicationTxnTimeout : timeout) +
+            " AND \"TXN_TYPE\" " +

Review comment:
       @aasha Thanks for taking a look.
   
   performTimeOuts() calls performTimeOutsInternal twice:
   1. times out non-repl txns -> timeout is timeout and TXN_TYPE != REPL_CREATED
   2. times out repl txns -> timeout is replicationTxnTimeout and TXN_TYPE=REPL_CREATED
   
   I supposed this could be optimized but I don't see how it is incorrect?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598629117



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5086,8 +5088,13 @@ public void performTimeOuts() {
       while(true) {
         stmt = dbConn.createStatement();
         String s = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +

Review comment:
       maybe move query to the constants and replace TIMEOUT placeholders here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r594944972



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5067,6 +5069,25 @@ private void timeOutLocks(Connection dbConn) {
    */
   @RetrySemantics.Idempotent
   public void performTimeOuts() {
+    try {
+      // time out non-replication txns
+      String epochFn = getEpochFn(dbProduct);
+      String timeoutTxnsQuery = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +
+              " AND \"TXN_LAST_HEARTBEAT\" <  " + epochFn + "-" + timeout +
+              " AND \"TXN_TYPE\" != " + TxnType.REPL_CREATED.getValue();
+      performTimeOutsInternal(timeoutTxnsQuery);
+
+      // time out replication txns (has different timeout value)
+      String timeoutReplicationTxnsQuery = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +

Review comment:
       I think it should go in one query with some OR criteria




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r594942599



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java
##########
@@ -459,11 +476,11 @@ public void testHeartbeater() throws Exception {
     // Case 3: If there's delay for the heartbeat, and the delay is long enough to trigger the reaper,
     //         then the txn will time out and be aborted.
     //         Here we just don't send the heartbeat at all - an infinite delay.
-    // Start the heartbeat after a delay, which exceeds the HIVE_TXN_TIMEOUT
-    ((DbTxnManager) txnMgr).openTxn(ctx, "jerry", TxnType.DEFAULT,
-        HiveConf.getTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS) * 2);
+    // Start the heartbeat after a delay, which exceeds the timeout threshold (e.g. TXN_TIMEOUT)
+    ((DbTxnManager) txnMgr).openTxn(ctx, "jerry", txnType,
+        MetastoreConf.getTimeVar(conf, timeThresholdConfVar, TimeUnit.MILLISECONDS) * 2);
     txnMgr.acquireLocks(qp, ctx, "jerry");
-    Thread.sleep(HiveConf.getTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS));
+    Thread.sleep(MetastoreConf.getTimeVar(conf, timeThresholdConfVar, TimeUnit.MILLISECONDS));

Review comment:
       Is this using the default value of 24 hours? that is going to be a slow test :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598638452



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5067,6 +5069,25 @@ private void timeOutLocks(Connection dbConn) {
    */
   @RetrySemantics.Idempotent
   public void performTimeOuts() {
+    try {
+      // time out non-replication txns
+      String epochFn = getEpochFn(dbProduct);
+      String timeoutTxnsQuery = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +
+              " AND \"TXN_LAST_HEARTBEAT\" <  " + epochFn + "-" + timeout +
+              " AND \"TXN_TYPE\" != " + TxnType.REPL_CREATED.getValue();
+      performTimeOutsInternal(timeoutTxnsQuery);

Review comment:
       Ok now it should be ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r597512918



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##########
@@ -1031,6 +1031,9 @@ public static ConfVars getMetaConf(String name) {
     REPL_METRICS_MAX_AGE("metastore.repl.metrics.max.age",
       "hive.metastore.repl.metrics.max.age", 7, TimeUnit.DAYS,
       "Maximal age of a replication metrics entry before it is removed."),
+    REPL_TXN_TIMEOUT("metastore.repl.txn.timeout", "hive.repl.txn.timeout", 1, TimeUnit.DAYS,

Review comment:
       REPL_TXN_TIMEOUT  is the time after hive.repl.event.db.listener.timetolive after which replication transactions are timed out. So the real timeout is (hive.repl.event.db.listener.timetolive  + REPL_TXN_TIMEOUT).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598246581



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java
##########
@@ -486,6 +505,8 @@ public void setUp() throws Exception {
     conf.setTimeVar(HiveConf.ConfVars.HIVE_TIMEDOUT_TXN_REAPER_START, 0, TimeUnit.SECONDS);
     conf.setTimeVar(HiveConf.ConfVars.HIVE_TXN_TIMEOUT, 10, TimeUnit.SECONDS);
     houseKeeperService = new AcidHouseKeeperService();
+    MetastoreConf.setTimeVar(conf, MetastoreConf.ConfVars.REPL_TXN_TIMEOUT, 1, TimeUnit.MINUTES);

Review comment:
       Do we need the test time interval to be a min? Or it can be lesser.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r599360873



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5086,8 +5088,13 @@ public void performTimeOuts() {
       while(true) {
         stmt = dbConn.createStatement();
         String s = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +

Review comment:
       note: you'll be polluting memory with many String objects every time this method is called.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r597512018



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##########
@@ -1031,6 +1031,9 @@ public static ConfVars getMetaConf(String name) {
     REPL_METRICS_MAX_AGE("metastore.repl.metrics.max.age",
       "hive.metastore.repl.metrics.max.age", 7, TimeUnit.DAYS,
       "Maximal age of a replication metrics entry before it is removed."),
+    REPL_TXN_TIMEOUT("metastore.repl.txn.timeout", "hive.repl.txn.timeout", 1, TimeUnit.DAYS,

Review comment:
       Thanks for taking a look! What default value do you suggest?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598629642



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5067,6 +5069,25 @@ private void timeOutLocks(Connection dbConn) {
    */
   @RetrySemantics.Idempotent
   public void performTimeOuts() {
+    try {
+      // time out non-replication txns
+      String epochFn = getEpochFn(dbProduct);
+      String timeoutTxnsQuery = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +
+              " AND \"TXN_LAST_HEARTBEAT\" <  " + epochFn + "-" + timeout +
+              " AND \"TXN_TYPE\" != " + TxnType.REPL_CREATED.getValue();
+      performTimeOutsInternal(timeoutTxnsQuery);

Review comment:
       I did in the newest commit. However there's a mistake somewhere so maybe don't check yet :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on pull request #2068:
URL: https://github.com/apache/hive/pull/2068#issuecomment-800081572


   @aasha Thanks for your comment, this was the input I was looking for.
   I've seen a case where a replication txn was open for more than a month. This blocks the compactor Cleaner thread from removing obsolete files from the FS which is a big problem. What do you think we should do with open replication txns this old?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha edited a comment on pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
aasha edited a comment on pull request #2068:
URL: https://github.com/apache/hive/pull/2068#issuecomment-800077351


   Events are replicated sequencially. There may be cases where in one replication schedule, the open transaction is replicated but the commit will be replicated in the next schedule. The next schedule may be much longer.So aborting repl transactions in a given timeframe will break replication.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r594958435



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java
##########
@@ -459,11 +476,11 @@ public void testHeartbeater() throws Exception {
     // Case 3: If there's delay for the heartbeat, and the delay is long enough to trigger the reaper,
     //         then the txn will time out and be aborted.
     //         Here we just don't send the heartbeat at all - an infinite delay.
-    // Start the heartbeat after a delay, which exceeds the HIVE_TXN_TIMEOUT
-    ((DbTxnManager) txnMgr).openTxn(ctx, "jerry", TxnType.DEFAULT,
-        HiveConf.getTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS) * 2);
+    // Start the heartbeat after a delay, which exceeds the timeout threshold (e.g. TXN_TIMEOUT)
+    ((DbTxnManager) txnMgr).openTxn(ctx, "jerry", txnType,
+        MetastoreConf.getTimeVar(conf, timeThresholdConfVar, TimeUnit.MILLISECONDS) * 2);
     txnMgr.acquireLocks(qp, ctx, "jerry");
-    Thread.sleep(HiveConf.getTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS));
+    Thread.sleep(MetastoreConf.getTimeVar(conf, timeThresholdConfVar, TimeUnit.MILLISECONDS));

Review comment:
       derp :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598247086



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5086,8 +5093,10 @@ public void performTimeOuts() {
       while(true) {
         stmt = dbConn.createStatement();
         String s = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +
-            " AND \"TXN_LAST_HEARTBEAT\" <  " + getEpochFn(dbProduct) + "-" + timeout +
-            " AND \"TXN_TYPE\" != " + TxnType.REPL_CREATED.getValue();
+            " AND \"TXN_LAST_HEARTBEAT\" <  " + getEpochFn(dbProduct) + "-" +
+            (replication ? replicationTxnTimeout : timeout) +
+            " AND \"TXN_TYPE\" " +

Review comment:
       TXN_TYPE will always be REPL_CREATED for replication. This check looks incorrect.
   It could be something like if txn type is REPL_CREATED, use replicationTxnTimeout else timeout




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp closed pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp closed pull request #2068:
URL: https://github.com/apache/hive/pull/2068


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598629117



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5086,8 +5088,13 @@ public void performTimeOuts() {
       while(true) {
         stmt = dbConn.createStatement();
         String s = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +

Review comment:
       maybe move query to the constants and replace TXN_LAST_HEARTBEAT placeholders here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2068: HIVE-24881: Abort old open replication txns

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r598543367



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -5086,8 +5093,10 @@ public void performTimeOuts() {
       while(true) {
         stmt = dbConn.createStatement();
         String s = " \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.OPEN +
-            " AND \"TXN_LAST_HEARTBEAT\" <  " + getEpochFn(dbProduct) + "-" + timeout +
-            " AND \"TXN_TYPE\" != " + TxnType.REPL_CREATED.getValue();
+            " AND \"TXN_LAST_HEARTBEAT\" <  " + getEpochFn(dbProduct) + "-" +
+            (replication ? replicationTxnTimeout : timeout) +
+            " AND \"TXN_TYPE\" " +

Review comment:
       @aasha I have updated the PR with a more concise/optimal solution




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org