You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/03/31 06:53:55 UTC

[GitHub] [hive] klcopp commented on a change in pull request #2073: HIVE-24828: Provide new HMS API to return latest committed compaction…

klcopp commented on a change in pull request #2073:
URL: https://github.com/apache/hive/pull/2073#discussion_r604639140



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FilterUtils.java
##########
@@ -442,4 +447,35 @@ public static void checkDbAndTableFilters(boolean isFilterEnabled,
       return compactions;
     }
   }
+
+  public static GetLatestCompactionInfoResponse filterLatestCompactionInfoIfEnabled(
+      boolean isFilterEnabled, MetaStoreFilterHook filterHook,
+      String catName, String dbName, String tableName, GetLatestCompactionInfoResponse response
+  ) throws MetaException {
+    if (isFilterEnabled && response.getCompactionsSize() > 0) {
+      List<LatestCompactionInfo> compactions = response.getCompactions();
+      if (compactions.get(0).getPartitionname() == null) {
+        // non partitioned table
+        List<String> tableNames = new ArrayList<>();
+        tableNames.add(tableName);
+        tableNames = filterHook.filterTableNames(catName, dbName, tableNames);
+        if (tableNames.isEmpty()) {
+          response.unsetCompactions();

Review comment:
       Where is this implemented?

##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1254,6 +1254,22 @@ struct ShowCompactResponse {
     1: required list<ShowCompactResponseElement> compacts,
 }
 
+struct GetLatestCompactionInfoRequest {

Review comment:
       The name of this function seems like it's requesting the last successful/failed/any status compaction for the table/partition. I think it should be clear that only successful and ready for cleaning statuses matter here. How about LatestFinishedCompaction, LatestDataUpdateByCompaction ... something like that?

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -3646,6 +3649,86 @@ public ShowCompactResponse showCompact(ShowCompactRequest rqst) throws MetaExcep
     }
   }
 
+  /**
+   * We assume this is only called by metadata cache server to know if there are new base/delta files should be read.
+   * The query filters compactions by state and only returns SUCCEEDED or READY_FOR_CLEANING compactions because
+   * only these two states means there are new files ready to be read.
+   */
+  @RetrySemantics.ReadOnly
+  public GetLatestCompactionInfoResponse getLatestCompactionInfo(GetLatestCompactionInfoRequest rqst)
+      throws MetaException {
+    GetLatestCompactionInfoResponse response = new GetLatestCompactionInfoResponse(new ArrayList<>());
+    Connection dbConn = null;
+    PreparedStatement pst = null;
+    ResultSet rs = null;
+    try {
+      try {
+        dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+
+        List<String> params = new ArrayList<>();
+        // This query combines the result sets of SUCCEEDED compactions and READY_FOR_CLEANING compactions
+        // We also sort the result by CC_ID in descending order so that we can keep only the latest record
+        // according to the order in result set
+        StringBuilder sb = new StringBuilder()
+            .append("SELECT * FROM (")

Review comment:
       I'm not sure about this but could you do: select max(id) and group by table & partition?

##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1254,6 +1254,22 @@ struct ShowCompactResponse {
     1: required list<ShowCompactResponseElement> compacts,
 }
 
+struct GetLatestCompactionInfoRequest {
+    1: required string dbname,
+    2: required string tablename,
+    3: optional list<string> partitionnames,
+}
+
+struct LatestCompactionInfo {

Review comment:
       CompactionInfoStruct will do the job and you can populate it with more information which might be useful in the future.

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -3646,6 +3649,86 @@ public ShowCompactResponse showCompact(ShowCompactRequest rqst) throws MetaExcep
     }
   }
 
+  /**
+   * We assume this is only called by metadata cache server to know if there are new base/delta files should be read.
+   * The query filters compactions by state and only returns SUCCEEDED or READY_FOR_CLEANING compactions because
+   * only these two states means there are new files ready to be read.
+   */
+  @RetrySemantics.ReadOnly
+  public GetLatestCompactionInfoResponse getLatestCompactionInfo(GetLatestCompactionInfoRequest rqst)
+      throws MetaException {
+    GetLatestCompactionInfoResponse response = new GetLatestCompactionInfoResponse(new ArrayList<>());
+    Connection dbConn = null;
+    PreparedStatement pst = null;
+    ResultSet rs = null;
+    try {
+      try {
+        dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+
+        List<String> params = new ArrayList<>();
+        // This query combines the result sets of SUCCEEDED compactions and READY_FOR_CLEANING compactions
+        // We also sort the result by CC_ID in descending order so that we can keep only the latest record
+        // according to the order in result set
+        StringBuilder sb = new StringBuilder()
+            .append("SELECT * FROM (")
+            .append("   SELECT")
+            .append("   \"CC_ID\", \"CC_DATABASE\", \"CC_TABLE\", \"CC_PARTITION\", \"CC_TYPE\"")
+            .append("   FROM \"COMPLETED_COMPACTIONS\"")
+            .append("     WHERE \"CC_STATE\" = " + quoteChar(SUCCEEDED_STATE))
+            .append("   UNION ALL")
+            .append("   SELECT")
+            .append("   \"CQ_ID\" AS \"CC_ID\", \"CQ_DATABASE\" AS \"CC_DATABASE\"")
+            .append("   ,\"CQ_TABLE\" AS \"CC_TABLE\", \"CQ_PARTITION\" AS \"CC_PARTITION\"")
+            .append("   ,\"CQ_TYPE\" AS \"CC_TYPE\"")
+            .append("   FROM \"COMPACTION_QUEUE\"")
+            .append("     WHERE \"CQ_STATE\" = " + quoteChar(READY_FOR_CLEANING))
+            .append(") AS compactions ")
+            .append(" WHERE \"CC_DATABASE\"=? AND \"CC_TABLE\"=?");
+        params.add(rqst.getDbname());
+        params.add(rqst.getTablename());
+        if (rqst.getPartitionnames() != null) {

Review comment:
       or list size == 0




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org