You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "ZihanLi58 (via GitHub)" <gi...@apache.org> on 2023/04/24 20:19:27 UTC

[GitHub] [gobblin] ZihanLi58 commented on a diff in pull request #3685: [GOBBLIN-1822]Logging Abnormal Helix Task States

ZihanLi58 commented on code in PR #3685:
URL: https://github.com/apache/gobblin/pull/3685#discussion_r1175743668


##########
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnAutoScalingManager.java:
##########
@@ -222,8 +223,22 @@ void runInternal() {
           if (jobContext != null) {
             log.debug("JobContext {} num partitions {}", jobContext, jobContext.getPartitionSet().size());
 
-            inUseInstances.addAll(jobContext.getPartitionSet().stream().map(jobContext::getAssignedParticipant)
-                .filter(Objects::nonNull).collect(Collectors.toSet()));
+            inUseInstances.addAll(jobContext.getPartitionSet().stream().map(i -> {
+              if(jobContext.getPartitionState(i) == null) {
+                return jobContext.getAssignedParticipant(i);
+              }
+              if (!jobContext.getPartitionState(i).equals(
+                  TaskPartitionState.ERROR) && !jobContext.getPartitionState(i).equals(

Review Comment:
   here are some of my consideration
   1. null check is to prevent the NPE exception when I try to compare the task state with a specific value and try to return getPartitionState specifically to maintain backward compatibility. But I'll refactor the code a little bit to make it look clean
   2. To ensure that logs are helpful and not noisy, I will reduce the amount of information logged for retriable task states. Even if the instances are added to the in-use map, they will be removed automatically during the next run of the method as retry assigns them to new instances, causing old ones to be removed automatically.
   3. To address the issue of tasks failing multiple times, I will add a log for tasks that have a high number of attempts. This will be clearer than logging the unusual task state every time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org