You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "hbgstc123 (via GitHub)" <gi...@apache.org> on 2023/04/22 15:13:37 UTC

[GitHub] [hudi] hbgstc123 opened a new pull request, #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

hbgstc123 opened a new pull request, #8546:
URL: https://github.com/apache/hudi/pull/8546

   …ing.
   
   ### Change Logs
   
   If online compaction or clustering of flink pipeline failed, it's hard locate which taskmanager contain error log for troubleshooting.
   In this pr add warning log with task id of failed compaction/clustering event, so we know which tm log to check.
   
   ### Impact
   
   no
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   no
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1177668814


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.java:
##########
@@ -101,6 +101,12 @@ public void open(Configuration parameters) throws Exception {
   @Override
   public void invoke(CompactionCommitEvent event, Context context) throws Exception {
     final String instant = event.getInstant();
+    if (event.isFailed() || !this.conf.getBoolean(FlinkOptions.IGNORE_FAILED)
+            && event.getWriteStatuses().stream().anyMatch(writeStatus -> writeStatus.getTotalErrorRecords() > 0)) {
+      LOG.warn("Receive abnormal CompactionCommitEvent of instant " + instant + ", task ID is " + event.getTaskID()

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [MINOR] Add log in flink compact/cluster commit sink for troubleshoot… [hudi]

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-2019257898

   @danny0405 is this still needed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1177288415


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -150,6 +159,13 @@ private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, List<
     long numErrorRecords = statuses.stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum).orElse(0L);
 
     if (numErrorRecords > 0 && !this.conf.getBoolean(FlinkOptions.IGNORE_FAILED)) {
+      events.forEach(event -> {
+        Option<Long> failRecordCnt = Option.fromJavaOptional(
+                event.getWriteStatuses().stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum));
+        if (failRecordCnt.isPresent() && failRecordCnt.get() > 0) {
+          LOG.warn("Clustering event with failed record num: " + failRecordCnt + " from task " + event.getTaskID());

Review Comment:
   good idea, done, please take another look



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1523195141

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2914fd9a3052f735733c8a212644918349943618",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606",
       "triggerID" : "2914fd9a3052f735733c8a212644918349943618",
       "triggerType" : "PUSH"
     }, {
       "hash" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16664",
       "triggerID" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8bf018b22a413e93fb8773605166eede0cf0da62",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8bf018b22a413e93fb8773605166eede0cf0da62",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2914fd9a3052f735733c8a212644918349943618 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606) 
   * 71bf09ba661b1a296233f14f116894c83788fc64 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16664) 
   * 8bf018b22a413e93fb8773605166eede0cf0da62 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1174732825


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -120,6 +120,11 @@ private void commitIfNecessary(String instant, List<ClusteringCommitEvent> event
     }
 
     if (events.stream().anyMatch(ClusteringCommitEvent::isFailed)) {
+      events.forEach(event -> {
+        if (event.isFailed()) {
+          LOG.warn("Failed clustering event from task " + event.getTaskID() + ", rollback instant " + instant);

Review Comment:
   There is no need to iterate over the evens again, just do it in the anyMatch block.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1177310563


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.java:
##########
@@ -101,6 +101,12 @@ public void open(Configuration parameters) throws Exception {
   @Override
   public void invoke(CompactionCommitEvent event, Context context) throws Exception {
     final String instant = event.getInstant();
+    if (event.isFailed() || !this.conf.getBoolean(FlinkOptions.IGNORE_FAILED)
+            && event.getWriteStatuses().stream().anyMatch(writeStatus -> writeStatus.getTotalErrorRecords() > 0)) {
+      LOG.warn("Receive abnormal CompactionCommitEvent of instant " + instant + ", task ID is " + event.getTaskID()

Review Comment:
   Can remove this `|| !this.conf.getBoolean(FlinkOptions.IGNORE_FAILED`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1518692946

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7fda5fac80a5466d7fccfeb47c82297c54b7988d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1176706593


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -150,6 +159,13 @@ private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, List<
     long numErrorRecords = statuses.stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum).orElse(0L);
 
     if (numErrorRecords > 0 && !this.conf.getBoolean(FlinkOptions.IGNORE_FAILED)) {
+      events.forEach(event -> {
+        Option<Long> failRecordCnt = Option.fromJavaOptional(
+                event.getWriteStatuses().stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum));
+        if (failRecordCnt.isPresent() && failRecordCnt.get() > 0) {
+          LOG.warn("Clustering event with failed record num: " + failRecordCnt + " from task " + event.getTaskID());

Review Comment:
   The write task alreay log some messages before it sends the events, does that make sense to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1176619896


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -141,6 +141,15 @@ private void commitIfNecessary(String instant, List<ClusteringCommitEvent> event
     }
   }
 
+  private boolean containFailedEvent(List<ClusteringCommitEvent> events) {
+    return events.stream().anyMatch(event -> {
+      if (event.isFailed()) {
+        LOG.warn("Failed clustering event from task " + event.getTaskID() + ", will rollback instant " + event.getInstant());
+      }

Review Comment:
   yes, log failed taskid to help finding error that fail the event.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [MINOR] Add log in flink compact/cluster commit sink for troubleshoot… [hudi]

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-2019260900

   yeah, @hbgstc123 can you rebase with the latest master?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1178594690


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -97,6 +97,11 @@ public void open(Configuration parameters) throws Exception {
   @Override
   public void invoke(ClusteringCommitEvent event, Context context) throws Exception {
     final String instant = event.getInstant();
+    if (event.isFailed() || event.getWriteStatuses().stream().anyMatch(writeStatus -> writeStatus.getTotalErrorRecords() > 0)) {
+      LOG.warn("Receive abnormal ClusteringCommitEvent of instant " + instant + ", task ID is " + event.getTaskID()
+              + ", is failed: " + event.isFailed() + ", error record count: "

Review Comment:
   If "an exception throwned with empty write statuses" the second condition `event.getWriteStatuses().stream().anyMatch(writeStatus -> writeStatus.getTotalErrorRecords() > 0)` will be false, 
    thus in this scene we will need the condition `event.isFailed()` to trigger the log, so seems like we need the condition `event.isFailed()` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1176627873


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -150,6 +159,13 @@ private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, List<
     long numErrorRecords = statuses.stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum).orElse(0L);
 
     if (numErrorRecords > 0 && !this.conf.getBoolean(FlinkOptions.IGNORE_FAILED)) {
+      events.forEach(event -> {
+        Option<Long> failRecordCnt = Option.fromJavaOptional(
+                event.getWriteStatuses().stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum));
+        if (failRecordCnt.isPresent() && failRecordCnt.get() > 0) {
+          LOG.warn("Clustering event with failed record num: " + failRecordCnt + " from task " + event.getTaskID());

Review Comment:
   Purpose is to find task id that contain failed writes, and log the failed number by the way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1519616180

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2914fd9a3052f735733c8a212644918349943618",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2914fd9a3052f735733c8a212644918349943618",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7fda5fac80a5466d7fccfeb47c82297c54b7988d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564) 
   * 2914fd9a3052f735733c8a212644918349943618 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1520463543

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2914fd9a3052f735733c8a212644918349943618",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606",
       "triggerID" : "2914fd9a3052f735733c8a212644918349943618",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2914fd9a3052f735733c8a212644918349943618 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1518798190

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7fda5fac80a5466d7fccfeb47c82297c54b7988d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1177245362


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -150,6 +159,13 @@ private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, List<
     long numErrorRecords = statuses.stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum).orElse(0L);
 
     if (numErrorRecords > 0 && !this.conf.getBoolean(FlinkOptions.IGNORE_FAILED)) {
+      events.forEach(event -> {
+        Option<Long> failRecordCnt = Option.fromJavaOptional(
+                event.getWriteStatuses().stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum));
+        if (failRecordCnt.isPresent() && failRecordCnt.get() > 0) {
+          LOG.warn("Clustering event with failed record num: " + failRecordCnt + " from task " + event.getTaskID());

Review Comment:
   Or maybe we can just log a message while receiving the event.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1522729150

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2914fd9a3052f735733c8a212644918349943618",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606",
       "triggerID" : "2914fd9a3052f735733c8a212644918349943618",
       "triggerType" : "PUSH"
     }, {
       "hash" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16664",
       "triggerID" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2914fd9a3052f735733c8a212644918349943618 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606) 
   * 71bf09ba661b1a296233f14f116894c83788fc64 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16664) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1522722762

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2914fd9a3052f735733c8a212644918349943618",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606",
       "triggerID" : "2914fd9a3052f735733c8a212644918349943618",
       "triggerType" : "PUSH"
     }, {
       "hash" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2914fd9a3052f735733c8a212644918349943618 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606) 
   * 71bf09ba661b1a296233f14f116894c83788fc64 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1176619896


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -141,6 +141,15 @@ private void commitIfNecessary(String instant, List<ClusteringCommitEvent> event
     }
   }
 
+  private boolean containFailedEvent(List<ClusteringCommitEvent> events) {
+    return events.stream().anyMatch(event -> {
+      if (event.isFailed()) {
+        LOG.warn("Failed clustering event from task " + event.getTaskID() + ", will rollback instant " + event.getInstant());
+      }

Review Comment:
   yes, log failed taskid, then can know which task manager produce the failed event, and check the log of that taskmanager may find error logs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1519679962

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2914fd9a3052f735733c8a212644918349943618",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606",
       "triggerID" : "2914fd9a3052f735733c8a212644918349943618",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7fda5fac80a5466d7fccfeb47c82297c54b7988d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564) 
   * 2914fd9a3052f735733c8a212644918349943618 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1523927396

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2914fd9a3052f735733c8a212644918349943618",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606",
       "triggerID" : "2914fd9a3052f735733c8a212644918349943618",
       "triggerType" : "PUSH"
     }, {
       "hash" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16664",
       "triggerID" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8bf018b22a413e93fb8773605166eede0cf0da62",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16679",
       "triggerID" : "8bf018b22a413e93fb8773605166eede0cf0da62",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8bf018b22a413e93fb8773605166eede0cf0da62 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16679) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1176776518


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -150,6 +159,13 @@ private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, List<
     long numErrorRecords = statuses.stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum).orElse(0L);
 
     if (numErrorRecords > 0 && !this.conf.getBoolean(FlinkOptions.IGNORE_FAILED)) {
+      events.forEach(event -> {
+        Option<Long> failRecordCnt = Option.fromJavaOptional(
+                event.getWriteStatuses().stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum));
+        if (failRecordCnt.isPresent() && failRecordCnt.get() > 0) {
+          LOG.warn("Clustering event with failed record num: " + failRecordCnt + " from task " + event.getTaskID());

Review Comment:
   If we have 100 task manger to run compact/clustering, maybe hard to find which tm contains the error log?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1176009567


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -141,6 +141,15 @@ private void commitIfNecessary(String instant, List<ClusteringCommitEvent> event
     }
   }
 
+  private boolean containFailedEvent(List<ClusteringCommitEvent> events) {
+    return events.stream().anyMatch(event -> {
+      if (event.isFailed()) {
+        LOG.warn("Failed clustering event from task " + event.getTaskID() + ", will rollback instant " + event.getInstant());
+      }

Review Comment:
   Is the log helping by pointing out the task id? If not, just log the message directly in line 124.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -141,6 +141,15 @@ private void commitIfNecessary(String instant, List<ClusteringCommitEvent> event
     }
   }
 
+  private boolean containFailedEvent(List<ClusteringCommitEvent> events) {
+    return events.stream().anyMatch(event -> {
+      if (event.isFailed()) {
+        LOG.warn("Failed clustering event from task " + event.getTaskID() + ", will rollback instant " + event.getInstant());
+      }

Review Comment:
   Is the log helpful by pointing out the task id? If not, just log the message directly in line 124.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "SteNicholas (via GitHub)" <gi...@apache.org>.
SteNicholas commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1176462241


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -150,6 +159,13 @@ private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, List<
     long numErrorRecords = statuses.stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum).orElse(0L);
 
     if (numErrorRecords > 0 && !this.conf.getBoolean(FlinkOptions.IGNORE_FAILED)) {
+      events.forEach(event -> {
+        Option<Long> failRecordCnt = Option.fromJavaOptional(
+                event.getWriteStatuses().stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum));
+        if (failRecordCnt.isPresent() && failRecordCnt.get() > 0) {
+          LOG.warn("Clustering event with failed record num: " + failRecordCnt + " from task " + event.getTaskID());

Review Comment:
   `HoodieCreateHandle#write` logs the failed record, therefore does this need to log failed record?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1523204823

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2914fd9a3052f735733c8a212644918349943618",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16606",
       "triggerID" : "2914fd9a3052f735733c8a212644918349943618",
       "triggerType" : "PUSH"
     }, {
       "hash" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16664",
       "triggerID" : "71bf09ba661b1a296233f14f116894c83788fc64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8bf018b22a413e93fb8773605166eede0cf0da62",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16679",
       "triggerID" : "8bf018b22a413e93fb8773605166eede0cf0da62",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 71bf09ba661b1a296233f14f116894c83788fc64 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16664) 
   * 8bf018b22a413e93fb8773605166eede0cf0da62 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16679) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8546:
URL: https://github.com/apache/hudi/pull/8546#discussion_r1178572693


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##########
@@ -97,6 +97,11 @@ public void open(Configuration parameters) throws Exception {
   @Override
   public void invoke(ClusteringCommitEvent event, Context context) throws Exception {
     final String instant = event.getInstant();
+    if (event.isFailed() || event.getWriteStatuses().stream().anyMatch(writeStatus -> writeStatus.getTotalErrorRecords() > 0)) {
+      LOG.warn("Receive abnormal ClusteringCommitEvent of instant " + instant + ", task ID is " + event.getTaskID()
+              + ", is failed: " + event.isFailed() + ", error record count: "

Review Comment:
   I guess the condition `event.isFailed()` is also verbose and unnecessary because when it's true, very probablily an exception throwned with empty write statuses.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8546:
URL: https://github.com/apache/hudi/pull/8546#issuecomment-1518694509

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564",
       "triggerID" : "7fda5fac80a5466d7fccfeb47c82297c54b7988d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7fda5fac80a5466d7fccfeb47c82297c54b7988d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16564) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org