You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/04/04 07:18:42 UTC

[GitHub] [hive] veghlaci05 opened a new pull request, #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

veghlaci05 opened a new pull request, #3172:
URL: https://github.com/apache/hive/pull/3172

   ### What changes were proposed in this pull request?
   Currently the users can manually trigger multiple compactions for the same table/partition in a short period. As a result, after the first request, all the others will set to 'ready for cleaning' state, but actually there's nothing to clean.
   
   ### Why are the changes needed?
   Changes in this PR prevent the users from submitting multiple compaction requests for the same table. After the first request, all subsequent requests will be refused until there is a change in the table.
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   ### How was this patch tested?
   Manually and through unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] veghlaci05 commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
veghlaci05 commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r860675617


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java:
##########
@@ -197,6 +198,9 @@ static void gatherStats(CompactionInfo ci, HiveConf conf, String userName, Strin
           statusUpdaterConf.set(TezConfiguration.TEZ_QUEUE_NAME, compactionQueueName);
         }
         SessionState sessionState = DriverUtils.setUpSessionState(statusUpdaterConf, userName, true);
+        Map<String, String> hiveVariables = sessionState.getHiveVariables();
+        hiveVariables.put(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG, "true");
+        sessionState.setHiveVariables(hiveVariables);

Review Comment:
   Sure, I'll create it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] veghlaci05 commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
veghlaci05 commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r857385048


##########
ql/src/test/queries/clientpositive/acid_insert_overwrite_update.q:
##########
@@ -26,7 +26,6 @@ insert overwrite table sequential_update values(current_timestamp, 0, current_ti
 delete from sequential_update where seq=2;
 select distinct IF(seq==0, 'LOOKS OKAY', 'BROKEN'), regexp_extract(INPUT__FILE__NAME, '.*/(.*)/[^/]*', 1) from sequential_update;
 
-alter table sequential_update compact 'major';

Review Comment:
   It turned out that the Q test running environment doesn't start the HMS background threads at all. As a result the issued compactions are never processed. From now on it is not possible to initiate a second compaction on the same table with the same write id before the previous one is cleaned up. As a result, the second compaction request was refused in this test. BTW this questions the necessity of the compaction commands in these tests for me, but I only removed only the second one  to make the test green again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
klcopp commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r857405715


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not increment it.
+      boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));

Review Comment:
   Wouldn't it be a bit nicer to put the flag in DriverContext or something, instead of using a session state variable? This is just a suggestion so feel free to take it or leave it...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r861707800


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java:
##########
@@ -522,8 +522,9 @@ protected Boolean findNextCompactionAndExecute(boolean collectGenericStats, bool
           UserGroupInformation ugi = UserGroupInformation.createProxyUser(ci.runAs,
               UserGroupInformation.getLoginUser());
 
+          ValidCompactorWriteIdList finalTblValidWriteIds = tblValidWriteIds;

Review Comment:
   did something change here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r860674098


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java:
##########
@@ -197,6 +198,9 @@ static void gatherStats(CompactionInfo ci, HiveConf conf, String userName, Strin
           statusUpdaterConf.set(TezConfiguration.TEZ_QUEUE_NAME, compactionQueueName);
         }
         SessionState sessionState = DriverUtils.setUpSessionState(statusUpdaterConf, userName, true);
+        Map<String, String> hiveVariables = sessionState.getHiveVariables();
+        hiveVariables.put(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG, "true");
+        sessionState.setHiveVariables(hiveVariables);

Review Comment:
   could we create sessionState.setHiveVariable() method in SessionState?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ merged pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ merged PR #3172:
URL: https://github.com/apache/hive/pull/3172


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r860674098


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java:
##########
@@ -197,6 +198,9 @@ static void gatherStats(CompactionInfo ci, HiveConf conf, String userName, Strin
           statusUpdaterConf.set(TezConfiguration.TEZ_QUEUE_NAME, compactionQueueName);
         }
         SessionState sessionState = DriverUtils.setUpSessionState(statusUpdaterConf, userName, true);
+        Map<String, String> hiveVariables = sessionState.getHiveVariables();
+        hiveVariables.put(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG, "true");
+        sessionState.setHiveVariables(hiveVariables);

Review Comment:
   could we create sessionState.setHiveVariable() method in SessionState?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r861711021


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##########
@@ -3705,12 +3707,22 @@ public CompactionResponse compact(CompactionRequest rqst) throws MetaException {
 
         long id = generateCompactionQueueId(stmt);
 
+        final ValidCompactorWriteIdList tblValidWriteIds = TxnUtils.createValidCompactWriteIdList(
+            getValidWriteIds(new GetValidWriteIdsRequest(

Review Comment:
   could we refactor this a bit, at least move GetValidWriteIdsRequest into a separate declaration



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] veghlaci05 commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
veghlaci05 commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r861721408


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java:
##########
@@ -522,8 +522,9 @@ protected Boolean findNextCompactionAndExecute(boolean collectGenericStats, bool
           UserGroupInformation ugi = UserGroupInformation.createProxyUser(ci.runAs,
               UserGroupInformation.getLoginUser());
 
+          ValidCompactorWriteIdList finalTblValidWriteIds = tblValidWriteIds;

Review Comment:
   I reverted it, it was remainder of the former solution approach



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r861702245


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java:
##########
@@ -46,26 +46,46 @@ private DriverUtils() {
     throw new UnsupportedOperationException("DriverUtils should not be instantiated!");
   }
 
-  public static void runOnDriver(HiveConf conf, String user, SessionState sessionState,
+  @FunctionalInterface
+  private interface DriverCreator {
+    Driver createDriver(QueryState qs);
+  }
+
+  public static void runOnDriver(HiveConf conf, SessionState sessionState,
       String query) throws HiveException {
-    runOnDriver(conf, user, sessionState, query, null, -1);
+    runOnDriver(conf, sessionState, query, null, -1);
   }
 
   /**
    * For Query Based compaction to run the query to generate the compacted data.
    */
-  public static void runOnDriver(HiveConf conf, String user,
+  public static void runOnDriver(HiveConf conf,
       SessionState sessionState, String query, ValidWriteIdList writeIds, long compactorTxnId)
       throws HiveException {
     if(writeIds != null && compactorTxnId < 0) {
       throw new IllegalArgumentException(JavaUtils.txnIdToString(compactorTxnId) +
           " is not valid. Context: " + query);
     }
+    runOnDriverInternal(query, conf, sessionState, (qs) -> new Driver(qs, writeIds, compactorTxnId));
+  }
+
+  /**
+   * For Query Based compaction to run the query to generate the compacted data.
+   */
+  public static void runOnDriver(HiveConf conf, SessionState sessionState, String query, long analyzeTableWriteId)
+      throws HiveException {
+    if(analyzeTableWriteId < 0) {
+      throw new IllegalArgumentException(JavaUtils.txnIdToString(analyzeTableWriteId) +
+          " is not valid. Context: " + query);
+    }
+    runOnDriverInternal(query, conf, sessionState, (qs) -> new Driver(qs, analyzeTableWriteId));
+  }
+
+  private static void runOnDriverInternal(String query, HiveConf conf, SessionState sessionState, DriverCreator creator) throws HiveException {
     SessionState.setCurrentSessionState(sessionState);
     boolean isOk = false;
     try {
-      QueryState qs = new QueryState.Builder().withHiveConf(conf).withGenerateNewQueryId(true).nonIsolated().build();
-      Driver driver = new Driver(qs, null, null, writeIds, compactorTxnId);
+      Driver driver = creator.createDriver(new QueryState.Builder().withHiveConf(conf).withGenerateNewQueryId(true).nonIsolated().build());

Review Comment:
   nit: could we format it , maybe move query state on a new line?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] veghlaci05 commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
veghlaci05 commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r857366146


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not increment it.
+      boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));

Review Comment:
   Unfortunately that won't work. This marker is set up within 
   `org.apache.hadoop.hive.ql.txn.compactor.Worker.StatsUpdater#gatherStats`
   method which is not a Compaction Txn. The reason I introduced this flag is to prevent the stats gathering from increasing the write id if it was initiated as a part of a compaction.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
klcopp commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r855867715


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not increment it.
+      boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));

Review Comment:
   I think you can use driverContext.getTxnType() instead (TxnType.COMPACTION)



##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java:
##########
@@ -797,7 +797,6 @@ public void testCompactStatsGather() throws Exception {
 
     int[][] targetVals2 = {{5, 1, 1}, {5, 2, 2}, {5, 3, 1}, {5, 4, 2}};
     runStatementOnDriver("insert into T partition(p=1,q) " + makeValuesClause(targetVals2));
-

Review Comment:
   Nit: Unnecessary change to this file



##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java:
##########
@@ -122,6 +125,56 @@ public void testCompactionShouldNotFailOnPartitionsWithBooleanField() throws Exc
             "ready for cleaning", compacts.get(0).getState());
   }
 
+  @Test
+  public void secondCompactionShouldBeRefusedBeforeEnqueueing() throws Exception {
+    conf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, true);
+    // Set delta numbuer threshold to 2 to avoid skipping compaction because of too few deltas
+    conf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD, 2);
+    // Set delta percentage to a high value to suppress selecting major compression based on that
+    conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 1000f);

Review Comment:
   These 2 settings aren't necessary since the Initiator uses these thresholds, but in the test we always queue compaction manually



##########
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java:
##########
@@ -38,6 +38,7 @@
 import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsResponse;
 import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
 import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionResponse;

Review Comment:
   Nit: Unused import



##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java:
##########
@@ -235,18 +235,18 @@ private void loadData(boolean isVectorized) throws Exception {
     runStatementOnDriver("export table Tstage to '" + getWarehouseDir() +"/2'");
     runStatementOnDriver("load data inpath '" + getWarehouseDir() + "/2/data' overwrite into table T");
     String[][] expected3 = new String[][] {
-        {"{\"writeid\":5,\"bucketid\":536870912,\"rowid\":0}\t5\t6", "t/base_0000005/000000_0"},

Review Comment:
   Just trying to understand – Why was the writeid 5 originally? And if there was no compaction in the meantime, why is it 4 now?



##########
ql/src/test/queries/clientpositive/acid_insert_overwrite_update.q:
##########
@@ -26,7 +26,6 @@ insert overwrite table sequential_update values(current_timestamp, 0, current_ti
 delete from sequential_update where seq=2;
 select distinct IF(seq==0, 'LOOKS OKAY', 'BROKEN'), regexp_extract(INPUT__FILE__NAME, '.*/(.*)/[^/]*', 1) from sequential_update;
 
-alter table sequential_update compact 'major';

Review Comment:
   Why change this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] veghlaci05 commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
veghlaci05 commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r857326564


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not increment it.
+      boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));

Review Comment:
   I think you meant 
   `driverContext.getTxnType().equals(TxnType.COMPACTION)`
   otherwise I don't understand your point :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r860697805


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not increment it.
+      boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));
       Table table = driverContext.getPlan().getAcidAnalyzeTable().getTable();
-      driverContext.getTxnManager().getTableWriteId(table.getDbName(), table.getTableName());
+      if(isWithinCompactionTxn) {

Review Comment:
   could be replaced with  if (driverContext.getCompactionWriteIds() != null) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r861702707


##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactOperation.java:
##########
@@ -53,6 +53,14 @@ public int execute() throws HiveException {
     String partitionName = getPartitionName(table);
 
     CompactionResponse resp = compact(table, partitionName);
+    if(!resp.isAccepted()) {
+      String message = "No detailed message available";

Review Comment:
   move to contants 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] veghlaci05 commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
veghlaci05 commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r857379938


##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java:
##########
@@ -235,18 +235,18 @@ private void loadData(boolean isVectorized) throws Exception {
     runStatementOnDriver("export table Tstage to '" + getWarehouseDir() +"/2'");
     runStatementOnDriver("load data inpath '" + getWarehouseDir() + "/2/data' overwrite into table T");
     String[][] expected3 = new String[][] {
-        {"{\"writeid\":5,\"bucketid\":536870912,\"rowid\":0}\t5\t6", "t/base_0000005/000000_0"},

Review Comment:
   Check line 222
   `runStatementOnDriver("alter table T compact 'major'")`
   Since stats gathering no longer increases the write id, (from now on) neither compactions do



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r861700560


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java:
##########
@@ -46,26 +46,46 @@ private DriverUtils() {
     throw new UnsupportedOperationException("DriverUtils should not be instantiated!");
   }
 
-  public static void runOnDriver(HiveConf conf, String user, SessionState sessionState,
+  @FunctionalInterface
+  private interface DriverCreator {
+    Driver createDriver(QueryState qs);
+  }
+
+  public static void runOnDriver(HiveConf conf, SessionState sessionState,
       String query) throws HiveException {
-    runOnDriver(conf, user, sessionState, query, null, -1);
+    runOnDriver(conf, sessionState, query, null, -1);
   }
 
   /**
    * For Query Based compaction to run the query to generate the compacted data.
    */
-  public static void runOnDriver(HiveConf conf, String user,
+  public static void runOnDriver(HiveConf conf,
       SessionState sessionState, String query, ValidWriteIdList writeIds, long compactorTxnId)
       throws HiveException {
     if(writeIds != null && compactorTxnId < 0) {
       throw new IllegalArgumentException(JavaUtils.txnIdToString(compactorTxnId) +
           " is not valid. Context: " + query);
     }
+    runOnDriverInternal(query, conf, sessionState, (qs) -> new Driver(qs, writeIds, compactorTxnId));
+  }
+
+  /**
+   * For Query Based compaction to run the query to generate the compacted data.

Review Comment:
   is the comment accurate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r861702386


##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactOperation.java:
##########
@@ -53,6 +53,14 @@ public int execute() throws HiveException {
     String partitionName = getPartitionName(table);
 
     CompactionResponse resp = compact(table, partitionName);
+    if(!resp.isAccepted()) {

Review Comment:
   nit: space



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r860693661


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not increment it.
+      boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));
       Table table = driverContext.getPlan().getAcidAnalyzeTable().getTable();
-      driverContext.getTxnManager().getTableWriteId(table.getDbName(), table.getTableName());
+      if(isWithinCompactionTxn) {
+        driverContext.getTxnManager().allocateMaxTableWriteId(table.getDbName(), table.getTableName());

Review Comment:
   instead of this could we supply compaction HWM here to avoid db call?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r861546821


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java:
##########
@@ -46,26 +46,46 @@ private DriverUtils() {
     throw new UnsupportedOperationException("DriverUtils should not be instantiated!");
   }
 
-  public static void runOnDriver(HiveConf conf, String user, SessionState sessionState,
+  @FunctionalInterface
+  private interface DriverCreator {
+    Driver createDriver(QueryState qs);
+  }
+
+  public static void runOnDriver(HiveConf conf, SessionState sessionState,
       String query) throws HiveException {
-    runOnDriver(conf, user, sessionState, query, null, -1);
+    runOnDriver(conf, sessionState, query, null, -1);
   }
 
   /**
    * For Query Based compaction to run the query to generate the compacted data.
    */
-  public static void runOnDriver(HiveConf conf, String user,
+  public static void runOnDriver(HiveConf conf,
       SessionState sessionState, String query, ValidWriteIdList writeIds, long compactorTxnId)
       throws HiveException {
     if(writeIds != null && compactorTxnId < 0) {

Review Comment:
   nit: space



##########
ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java:
##########
@@ -46,26 +46,46 @@ private DriverUtils() {
     throw new UnsupportedOperationException("DriverUtils should not be instantiated!");
   }
 
-  public static void runOnDriver(HiveConf conf, String user, SessionState sessionState,
+  @FunctionalInterface
+  private interface DriverCreator {
+    Driver createDriver(QueryState qs);
+  }
+
+  public static void runOnDriver(HiveConf conf, SessionState sessionState,
       String query) throws HiveException {
-    runOnDriver(conf, user, sessionState, query, null, -1);
+    runOnDriver(conf, sessionState, query, null, -1);
   }
 
   /**
    * For Query Based compaction to run the query to generate the compacted data.
    */
-  public static void runOnDriver(HiveConf conf, String user,
+  public static void runOnDriver(HiveConf conf,
       SessionState sessionState, String query, ValidWriteIdList writeIds, long compactorTxnId)
       throws HiveException {
     if(writeIds != null && compactorTxnId < 0) {
       throw new IllegalArgumentException(JavaUtils.txnIdToString(compactorTxnId) +
           " is not valid. Context: " + query);
     }
+    runOnDriverInternal(query, conf, sessionState, (qs) -> new Driver(qs, writeIds, compactorTxnId));
+  }
+
+  /**
+   * For Query Based compaction to run the query to generate the compacted data.
+   */
+  public static void runOnDriver(HiveConf conf, SessionState sessionState, String query, long analyzeTableWriteId)
+      throws HiveException {
+    if(analyzeTableWriteId < 0) {

Review Comment:
   nit: space



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
klcopp commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r857334379


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not increment it.
+      boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));

Review Comment:
   Yes, exactly!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
klcopp commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r857400842


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not increment it.
+      boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));

Review Comment:
   I understand, too bad:)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r860693661


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not increment it.
+      boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));
       Table table = driverContext.getPlan().getAcidAnalyzeTable().getTable();
-      driverContext.getTxnManager().getTableWriteId(table.getDbName(), table.getTableName());
+      if(isWithinCompactionTxn) {
+        driverContext.getTxnManager().allocateMaxTableWriteId(table.getDbName(), table.getTableName());

Review Comment:
   instead of this could we supply compaction HWM here to avoid db call? they should be already present in DriverContext
   ````
   driverContext.setCompactionWriteIds(compactionWriteIds);
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] veghlaci05 commented on a diff in pull request #3172: HIVE-26107: Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Posted by GitBox <gi...@apache.org>.
veghlaci05 commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r861716410


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java:
##########
@@ -46,26 +46,46 @@ private DriverUtils() {
     throw new UnsupportedOperationException("DriverUtils should not be instantiated!");
   }
 
-  public static void runOnDriver(HiveConf conf, String user, SessionState sessionState,
+  @FunctionalInterface
+  private interface DriverCreator {
+    Driver createDriver(QueryState qs);
+  }
+
+  public static void runOnDriver(HiveConf conf, SessionState sessionState,
       String query) throws HiveException {
-    runOnDriver(conf, user, sessionState, query, null, -1);
+    runOnDriver(conf, sessionState, query, null, -1);
   }
 
   /**
    * For Query Based compaction to run the query to generate the compacted data.
    */
-  public static void runOnDriver(HiveConf conf, String user,
+  public static void runOnDriver(HiveConf conf,
       SessionState sessionState, String query, ValidWriteIdList writeIds, long compactorTxnId)
       throws HiveException {
     if(writeIds != null && compactorTxnId < 0) {
       throw new IllegalArgumentException(JavaUtils.txnIdToString(compactorTxnId) +
           " is not valid. Context: " + query);
     }
+    runOnDriverInternal(query, conf, sessionState, (qs) -> new Driver(qs, writeIds, compactorTxnId));
+  }
+
+  /**
+   * For Query Based compaction to run the query to generate the compacted data.

Review Comment:
   No, updated it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org