You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/12 15:49:42 UTC

[GitHub] [druid] sub007 edited a comment on issue #10838: Supervisors全部UNHEALTHY_TASKS

sub007 edited a comment on issue #10838:
URL: https://github.com/apache/druid/issues/10838#issuecomment-797575217


   I see the same issue as reported above. 
   
   Here this is what I witnessed -
   I've a supervisor running. The task duration is 60mins.
   When I check the status of the supervisor, it's UNHEALTHY_TASKS.
   The reason for that is (as shown in the supervisor status), some failed tasks in the past and not the recent past. The supervisor has moved on from the time when some tasks failed with concurrent execution exception and has created multiple tasks after that and those tasks have ingested data from corresponding kakfa topic for the task duration set and then terminated. 
   But still - all those tasks show their corresponding status as failed. 
   I checked the task logs - but I didn't see any errors in the task index logs. 
   Nothing in the logs of any of the other services as well.
   I understand that, for a Supervisor to move to UNHEALTHY_TASKS status, the successive 3 tasks should end up in failed state. And to return to HEALTHY state, the 3 successive tasks should end up in success.
   
   So, there are two things 
   1. Why are all the tasks ending up in failed state but no errors in logs.
   2. When I look at the supervisor status, why does it list a set of 3 tasks which have failed with concurrent execution exception long time ago and doesn't list any of the latest failed tasks as the reason for its unhealthy state. 
   
   My discussion with Peter on slack channel [here](https://the-asf.slack.com/archives/CJ8D1JTB8/p1615212967202500)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org