You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/18 12:42:17 UTC

[GitHub] [hudi] SteNicholas opened a new pull request, #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

SteNicholas opened a new pull request, #6991:
URL: https://github.com/apache/hudi/pull/6991

   ### Change Logs
   
   `HoodieCatalog` doesn't support the implementation of dropPartition at present, which is adaptive for the scenario that current partition backfills.
   
   ### Impact
   
   Supports the `HoodieCatalog#dropPartition` implementation.
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1285012825

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 72b46a48f95c59b1911318fc6ce5e73dcbc407c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301) 
   * eb8629f0256b4ad31cd28510b41e32a3af419889 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1001599929


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -505,7 +552,29 @@ private Map<String, String> applyOptionsHook(String tablePath, Map<String, Strin
     return newOptions;
   }
 
-  private String inferTablePath(String catalogPath, ObjectPath tablePath) {
+  private HoodieFlinkWriteClient<?> createWriteClient(
+      Map<String, String> options,
+      String tablePathStr,
+      ObjectPath tablePath) throws IOException {
+    return StreamerUtil.createWriteClient(
+        Configuration.fromMap(options)
+            .set(FlinkOptions.TABLE_NAME, tablePath.getObjectName())
+            .set(FlinkOptions.SOURCE_AVRO_SCHEMA,
+                StreamerUtil.createMetaClient(tablePathStr, hadoopConf)
+                    .getTableConfig().getTableCreateSchema().get().toString()));
+  }
+
+  @VisibleForTesting
+  protected String inferTablePath(String catalogPath, ObjectPath tablePath) {
     return String.format("%s/%s/%s", catalogPath, tablePath.getDatabaseName(), tablePath.getObjectName());
   }
+
+  private String inferPartitionPath(boolean hiveStylePartitioning, CatalogPartitionSpec catalogPartitionSpec) {
+    return catalogPartitionSpec.getPartitionSpec().entrySet()
+        .stream().map(entry ->

Review Comment:
   Can we move this method into the util `HoodieCatalogUtil` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by "SteNicholas (via GitHub)" <gi...@apache.org>.
SteNicholas commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1122611901


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -394,7 +408,40 @@ public void createPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPa
   @Override
   public void dropPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPartitionSpec, boolean ignoreIfNotExists)
       throws PartitionNotExistException, CatalogException {
-    throw new UnsupportedOperationException("dropPartition is not implemented.");
+    if (!tableExists(tablePath)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    String tablePathStr = inferTablePath(catalogPathStr, tablePath);
+    Map<String, String> options = TableOptionProperties.loadFromProperties(tablePathStr, hadoopConf);
+    boolean hiveStylePartitioning = Boolean.parseBoolean(options.getOrDefault(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), "false"));
+    String partitionPathStr = HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, catalogPartitionSpec);
+
+    if (!StreamerUtil.partitionExists(tablePathStr, partitionPathStr, hadoopConf)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    // enable auto-commit though ~
+    options.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), "true");
+    try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(options, tablePathStr, tablePath)) {
+      writeClient.deletePartitions(Collections.singletonList(partitionPathStr), HoodieActiveTimeline.createNewInstantTime())
+          .forEach(writeStatus -> {
+            if (writeStatus.hasErrors()) {
+              throw new HoodieMetadataException(String.format("Failed to commit metadata table records at file id %s.", writeStatus.getFileId()));
+            }
+          });
+      fs.delete(new Path(tablePathStr, partitionPathStr), true);

Review Comment:
   @TengHuo, like `HoodieHiveCatalog`, the `dropPartition` operation needs to drop the partition meta and directory on the filesystem. Otherwise, after the `dropPartition` operation, as you mentioned, this will cause the unvalid data files in the dropped partition if there is insert operation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by "SteNicholas (via GitHub)" <gi...@apache.org>.
SteNicholas commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1122611901


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -394,7 +408,40 @@ public void createPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPa
   @Override
   public void dropPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPartitionSpec, boolean ignoreIfNotExists)
       throws PartitionNotExistException, CatalogException {
-    throw new UnsupportedOperationException("dropPartition is not implemented.");
+    if (!tableExists(tablePath)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    String tablePathStr = inferTablePath(catalogPathStr, tablePath);
+    Map<String, String> options = TableOptionProperties.loadFromProperties(tablePathStr, hadoopConf);
+    boolean hiveStylePartitioning = Boolean.parseBoolean(options.getOrDefault(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), "false"));
+    String partitionPathStr = HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, catalogPartitionSpec);
+
+    if (!StreamerUtil.partitionExists(tablePathStr, partitionPathStr, hadoopConf)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    // enable auto-commit though ~
+    options.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), "true");
+    try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(options, tablePathStr, tablePath)) {
+      writeClient.deletePartitions(Collections.singletonList(partitionPathStr), HoodieActiveTimeline.createNewInstantTime())
+          .forEach(writeStatus -> {
+            if (writeStatus.hasErrors()) {
+              throw new HoodieMetadataException(String.format("Failed to commit metadata table records at file id %s.", writeStatus.getFileId()));
+            }
+          });
+      fs.delete(new Path(tablePathStr, partitionPathStr), true);

Review Comment:
   Like `HoodieHiveCatalog`, the `dropPartition` operation needs to drop the partition meta and directory on the filesystem. Otherwise, after the `dropPartition` operation, as you mentioned, this will cause the unvalid data files in the dropped partition if there is insert operation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1288566933

   @danny0405, I have addressed above comments. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1286383820

   @danny0405, I have already applied the above patch and fixes the `HoodieHiveCatalog`. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1285745185

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fbb4c300133ad1c677826b836652152032b4c616",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fbb4c300133ad1c677826b836652152032b4c616",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 72b46a48f95c59b1911318fc6ce5e73dcbc407c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301) 
   * eb8629f0256b4ad31cd28510b41e32a3af419889 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375) 
   * fbb4c300133ad1c677826b836652152032b4c616 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
danny0405 merged PR #6991:
URL: https://github.com/apache/hudi/pull/6991


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1288750198

   @danny0405, I have applied the above patch and removed the unused import in the patch. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1285007395

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 72b46a48f95c59b1911318fc6ce5e73dcbc407c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301) 
   * eb8629f0256b4ad31cd28510b41e32a3af419889 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1288758812

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fbb4c300133ad1c677826b836652152032b4c616",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392",
       "triggerID" : "fbb4c300133ad1c677826b836652152032b4c616",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12521",
       "triggerID" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fe64bb4263dee4e7ce1bed5ac62bf0a12347fd64",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fe64bb4263dee4e7ce1bed5ac62bf0a12347fd64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fbb4c300133ad1c677826b836652152032b4c616 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392) 
   * 4d799a4b93f7489169b3ab0e7f37f0013e4693f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12521) 
   * fe64bb4263dee4e7ce1bed5ac62bf0a12347fd64 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1001598983


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##########
@@ -906,4 +987,23 @@ private Map<String, String> supplementOptions(
       return newOptions;
     }
   }
+
+  private HoodieFlinkWriteClient<?> createWriteClient(
+      String tablePathStr,
+      ObjectPath tablePath) throws Exception {
+    Map<String, String> options = supplementOptions(tablePath, translateSparkTable2Flink(tablePath, getHiveTable(tablePath)).getParameters());
+    // enable auto-commit though ~

Review Comment:
   Just invoke the #getTable to fetch the catalog table first, we can then use the table options.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1288765795

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fbb4c300133ad1c677826b836652152032b4c616",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392",
       "triggerID" : "fbb4c300133ad1c677826b836652152032b4c616",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12521",
       "triggerID" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fe64bb4263dee4e7ce1bed5ac62bf0a12347fd64",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12522",
       "triggerID" : "fe64bb4263dee4e7ce1bed5ac62bf0a12347fd64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4d799a4b93f7489169b3ab0e7f37f0013e4693f1 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12521) 
   * fe64bb4263dee4e7ce1bed5ac62bf0a12347fd64 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12522) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1289368960

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fbb4c300133ad1c677826b836652152032b4c616",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392",
       "triggerID" : "fbb4c300133ad1c677826b836652152032b4c616",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12521",
       "triggerID" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fe64bb4263dee4e7ce1bed5ac62bf0a12347fd64",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12522",
       "triggerID" : "fe64bb4263dee4e7ce1bed5ac62bf0a12347fd64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe64bb4263dee4e7ce1bed5ac62bf0a12347fd64 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12522) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1289925035

   The failed test case for `TestHoodieLogFormat` has no relationship with this change, and i test it locally, would just merge it soon ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1288549663

   @danny0405, I have addressed above comments. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] TengHuo commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by "TengHuo (via GitHub)" <gi...@apache.org>.
TengHuo commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1122565966


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -394,7 +408,40 @@ public void createPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPa
   @Override
   public void dropPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPartitionSpec, boolean ignoreIfNotExists)
       throws PartitionNotExistException, CatalogException {
-    throw new UnsupportedOperationException("dropPartition is not implemented.");
+    if (!tableExists(tablePath)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    String tablePathStr = inferTablePath(catalogPathStr, tablePath);
+    Map<String, String> options = TableOptionProperties.loadFromProperties(tablePathStr, hadoopConf);
+    boolean hiveStylePartitioning = Boolean.parseBoolean(options.getOrDefault(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), "false"));
+    String partitionPathStr = HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, catalogPartitionSpec);
+
+    if (!StreamerUtil.partitionExists(tablePathStr, partitionPathStr, hadoopConf)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    // enable auto-commit though ~
+    options.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), "true");
+    try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(options, tablePathStr, tablePath)) {
+      writeClient.deletePartitions(Collections.singletonList(partitionPathStr), HoodieActiveTimeline.createNewInstantTime())
+          .forEach(writeStatus -> {
+            if (writeStatus.hasErrors()) {
+              throw new HoodieMetadataException(String.format("Failed to commit metadata table records at file id %s.", writeStatus.getFileId()));
+            }
+          });
+      fs.delete(new Path(tablePathStr, partitionPathStr), true);

Review Comment:
   Hi
   
   May I ask why we need to do `fs.delete` here? Will it cause any problem?
   
   I refer the code in `HoodieSparkSqlWriter.scala` and `SparkDeletePartitionCommitActionExecutor.java`.
   There is no path deletion operation in spark side. 
   
   https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L270
   
   https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkDeletePartitionCommitActionExecutor.java#L62
   
   cc @voonhous 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] TengHuo commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by "TengHuo (via GitHub)" <gi...@apache.org>.
TengHuo commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1122565966


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -394,7 +408,40 @@ public void createPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPa
   @Override
   public void dropPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPartitionSpec, boolean ignoreIfNotExists)
       throws PartitionNotExistException, CatalogException {
-    throw new UnsupportedOperationException("dropPartition is not implemented.");
+    if (!tableExists(tablePath)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    String tablePathStr = inferTablePath(catalogPathStr, tablePath);
+    Map<String, String> options = TableOptionProperties.loadFromProperties(tablePathStr, hadoopConf);
+    boolean hiveStylePartitioning = Boolean.parseBoolean(options.getOrDefault(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), "false"));
+    String partitionPathStr = HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, catalogPartitionSpec);
+
+    if (!StreamerUtil.partitionExists(tablePathStr, partitionPathStr, hadoopConf)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    // enable auto-commit though ~
+    options.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), "true");
+    try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(options, tablePathStr, tablePath)) {
+      writeClient.deletePartitions(Collections.singletonList(partitionPathStr), HoodieActiveTimeline.createNewInstantTime())
+          .forEach(writeStatus -> {
+            if (writeStatus.hasErrors()) {
+              throw new HoodieMetadataException(String.format("Failed to commit metadata table records at file id %s.", writeStatus.getFileId()));
+            }
+          });
+      fs.delete(new Path(tablePathStr, partitionPathStr), true);

Review Comment:
   Hi
   
   May I ask why we need to do `fs.delete` here? Will it cause any problem?
   
   I refer the code in `HoodieSparkSqlWriter.scala` and `SparkDeletePartitionCommitActionExecutor.java`.
   There is no path deletion operation in spark side. 
   
   https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L270
   
   https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkDeletePartitionCommitActionExecutor.java#L62



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1001593967


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##########
@@ -488,8 +495,9 @@ private void initTableIfNotExists(ObjectPath tablePath, CatalogTable catalogTabl
     }
   }
 
-  private String inferTablePath(ObjectPath tablePath, CatalogBaseTable table) {
-    String location = table.getOptions().getOrDefault(PATH.key(), "");
+  @VisibleForTesting
+  public String inferTablePath(ObjectPath tablePath, CatalogBaseTable table) {

Review Comment:
   Revert this change, the `table` should never be null and we should respect the path of the table options.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1285754801

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fbb4c300133ad1c677826b836652152032b4c616",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392",
       "triggerID" : "fbb4c300133ad1c677826b836652152032b4c616",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * eb8629f0256b4ad31cd28510b41e32a3af419889 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375) 
   * fbb4c300133ad1c677826b836652152032b4c616 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1288724249

   @danny0405, I have applied above patch and thanks for giving the patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1283427751

   Thanks for the contribution, i have reviewed and applied a patch here:
   [5049.patch.zip](https://github.com/apache/hudi/files/9816915/5049.patch.zip)
   
   Can you also fix the `HoodieHiveCatalog` then ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1282911960

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 72b46a48f95c59b1911318fc6ce5e73dcbc407c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1288645848

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fbb4c300133ad1c677826b836652152032b4c616",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392",
       "triggerID" : "fbb4c300133ad1c677826b836652152032b4c616",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fbb4c300133ad1c677826b836652152032b4c616 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392) 
   * 4d799a4b93f7489169b3ab0e7f37f0013e4693f1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1288661437

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fbb4c300133ad1c677826b836652152032b4c616",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392",
       "triggerID" : "fbb4c300133ad1c677826b836652152032b4c616",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12521",
       "triggerID" : "4d799a4b93f7489169b3ab0e7f37f0013e4693f1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fbb4c300133ad1c677826b836652152032b4c616 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392) 
   * 4d799a4b93f7489169b3ab0e7f37f0013e4693f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12521) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by "SteNicholas (via GitHub)" <gi...@apache.org>.
SteNicholas commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1122611901


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -394,7 +408,40 @@ public void createPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPa
   @Override
   public void dropPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPartitionSpec, boolean ignoreIfNotExists)
       throws PartitionNotExistException, CatalogException {
-    throw new UnsupportedOperationException("dropPartition is not implemented.");
+    if (!tableExists(tablePath)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    String tablePathStr = inferTablePath(catalogPathStr, tablePath);
+    Map<String, String> options = TableOptionProperties.loadFromProperties(tablePathStr, hadoopConf);
+    boolean hiveStylePartitioning = Boolean.parseBoolean(options.getOrDefault(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), "false"));
+    String partitionPathStr = HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, catalogPartitionSpec);
+
+    if (!StreamerUtil.partitionExists(tablePathStr, partitionPathStr, hadoopConf)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    // enable auto-commit though ~
+    options.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), "true");
+    try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(options, tablePathStr, tablePath)) {
+      writeClient.deletePartitions(Collections.singletonList(partitionPathStr), HoodieActiveTimeline.createNewInstantTime())
+          .forEach(writeStatus -> {
+            if (writeStatus.hasErrors()) {
+              throw new HoodieMetadataException(String.format("Failed to commit metadata table records at file id %s.", writeStatus.getFileId()));
+            }
+          });
+      fs.delete(new Path(tablePathStr, partitionPathStr), true);

Review Comment:
   @TengHuo, like `HoodieHiveCatalog`, the `dropPartition` operation needs to drop the partition meta and directory on the filesystem. Otherwise, after the `dropPartition` operation, as you mentioned, this will cause the unvalid data files in the dropped partition if there is insert operation and no cleaner to clean the data files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by "SteNicholas (via GitHub)" <gi...@apache.org>.
SteNicholas commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1128990648


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -394,7 +408,40 @@ public void createPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPa
   @Override
   public void dropPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPartitionSpec, boolean ignoreIfNotExists)
       throws PartitionNotExistException, CatalogException {
-    throw new UnsupportedOperationException("dropPartition is not implemented.");
+    if (!tableExists(tablePath)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    String tablePathStr = inferTablePath(catalogPathStr, tablePath);
+    Map<String, String> options = TableOptionProperties.loadFromProperties(tablePathStr, hadoopConf);
+    boolean hiveStylePartitioning = Boolean.parseBoolean(options.getOrDefault(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), "false"));
+    String partitionPathStr = HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, catalogPartitionSpec);
+
+    if (!StreamerUtil.partitionExists(tablePathStr, partitionPathStr, hadoopConf)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    // enable auto-commit though ~
+    options.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), "true");
+    try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(options, tablePathStr, tablePath)) {
+      writeClient.deletePartitions(Collections.singletonList(partitionPathStr), HoodieActiveTimeline.createNewInstantTime())
+          .forEach(writeStatus -> {
+            if (writeStatus.hasErrors()) {
+              throw new HoodieMetadataException(String.format("Failed to commit metadata table records at file id %s.", writeStatus.getFileId()));
+            }
+          });
+      fs.delete(new Path(tablePathStr, partitionPathStr), true);

Review Comment:
   @TengHuo, the invalid data files after dropping partition mean that the data files is dirty data. Meanwhile, why should the `dropPartition` behavior keeps the consistency between Flink and Spark? If should, IMO, this should provider unified interface to drop partition.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -394,7 +408,40 @@ public void createPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPa
   @Override
   public void dropPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPartitionSpec, boolean ignoreIfNotExists)
       throws PartitionNotExistException, CatalogException {
-    throw new UnsupportedOperationException("dropPartition is not implemented.");
+    if (!tableExists(tablePath)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    String tablePathStr = inferTablePath(catalogPathStr, tablePath);
+    Map<String, String> options = TableOptionProperties.loadFromProperties(tablePathStr, hadoopConf);
+    boolean hiveStylePartitioning = Boolean.parseBoolean(options.getOrDefault(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), "false"));
+    String partitionPathStr = HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, catalogPartitionSpec);
+
+    if (!StreamerUtil.partitionExists(tablePathStr, partitionPathStr, hadoopConf)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    // enable auto-commit though ~
+    options.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), "true");
+    try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(options, tablePathStr, tablePath)) {
+      writeClient.deletePartitions(Collections.singletonList(partitionPathStr), HoodieActiveTimeline.createNewInstantTime())
+          .forEach(writeStatus -> {
+            if (writeStatus.hasErrors()) {
+              throw new HoodieMetadataException(String.format("Failed to commit metadata table records at file id %s.", writeStatus.getFileId()));
+            }
+          });
+      fs.delete(new Path(tablePathStr, partitionPathStr), true);

Review Comment:
   @TengHuo, the invalid data files after dropping partition mean that the data files is dirty data. Meanwhile, why should the `dropPartition` behavior keeps the consistency between Flink and Spark? If should, IMO, this should provider unified interface to drop partition.
   cc @voonhous 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1286436827

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12375",
       "triggerID" : "eb8629f0256b4ad31cd28510b41e32a3af419889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fbb4c300133ad1c677826b836652152032b4c616",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392",
       "triggerID" : "fbb4c300133ad1c677826b836652152032b4c616",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fbb4c300133ad1c677826b836652152032b4c616 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12392) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1001594436


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##########
@@ -501,6 +509,13 @@ private String inferTablePath(ObjectPath tablePath, CatalogBaseTable table) {
     return location;
   }
 
+  @VisibleForTesting
+  public String inferPartitionPath(CatalogPartitionSpec catalogPartitionSpec) {
+    return catalogPartitionSpec.getPartitionSpec().entrySet()
+        .stream().map(entry -> String.format("%s=%s", entry.getKey(), entry.getValue()))

Review Comment:
   hive style partitioning ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1282333658

   @danny0405, could you please help to review this pull request?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1282385496

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 72b46a48f95c59b1911318fc6ce5e73dcbc407c4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12301) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1282376707

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72b46a48f95c59b1911318fc6ce5e73dcbc407c4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 72b46a48f95c59b1911318fc6ce5e73dcbc407c4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #6991:
URL: https://github.com/apache/hudi/pull/6991#issuecomment-1288690781

   [5049.patch.zip](https://github.com/apache/hudi/files/9850345/5049.patch.zip)
   Thanks for the contribution, i have reviewed and applied a patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] TengHuo commented on a diff in pull request #6991: [HUDI-5049] HoodieCatalog supports the implementation of dropPartition

Posted by "TengHuo (via GitHub)" <gi...@apache.org>.
TengHuo commented on code in PR #6991:
URL: https://github.com/apache/hudi/pull/6991#discussion_r1126150688


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -394,7 +408,40 @@ public void createPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPa
   @Override
   public void dropPartition(ObjectPath tablePath, CatalogPartitionSpec catalogPartitionSpec, boolean ignoreIfNotExists)
       throws PartitionNotExistException, CatalogException {
-    throw new UnsupportedOperationException("dropPartition is not implemented.");
+    if (!tableExists(tablePath)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    String tablePathStr = inferTablePath(catalogPathStr, tablePath);
+    Map<String, String> options = TableOptionProperties.loadFromProperties(tablePathStr, hadoopConf);
+    boolean hiveStylePartitioning = Boolean.parseBoolean(options.getOrDefault(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), "false"));
+    String partitionPathStr = HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, catalogPartitionSpec);
+
+    if (!StreamerUtil.partitionExists(tablePathStr, partitionPathStr, hadoopConf)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new PartitionNotExistException(getName(), tablePath, catalogPartitionSpec);
+      }
+    }
+
+    // enable auto-commit though ~
+    options.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), "true");
+    try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(options, tablePathStr, tablePath)) {
+      writeClient.deletePartitions(Collections.singletonList(partitionPathStr), HoodieActiveTimeline.createNewInstantTime())
+          .forEach(writeStatus -> {
+            if (writeStatus.hasErrors()) {
+              throw new HoodieMetadataException(String.format("Failed to commit metadata table records at file id %s.", writeStatus.getFileId()));
+            }
+          });
+      fs.delete(new Path(tablePathStr, partitionPathStr), true);

Review Comment:
   Got it.
   
   So, if I understand correctly, in spark side, there will be the invalid data files in the dropped partition if there is insert operation and no cleaner to clean the data files. Am I right?
   
   And may I ask what the invalid data files issue is exactly? Do you have a ticket about it?
   
   Voon and me are checking the code about `drop partitions`, we may fix it if there is any.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org