You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Maxim Khutornenko (JIRA)" <ji...@apache.org> on 2015/08/13 21:00:46 UTC

[jira] [Commented] (AURORA-1193) Improve UI task status reporting experience

    [ https://issues.apache.org/jira/browse/AURORA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695767#comment-14695767 ] 

Maxim Khutornenko commented on AURORA-1193:
-------------------------------------------

Compiled all existing Mesos task status update reasons and messages into the table:

|| Mesos task reason || Mesos message || Aurora message || Aurora task status || Comments ||
| REASON_COMMAND_EXECUTOR_FAILED | “Abnormal executor termination” | same | FAILED | |
| REASON_EXECUTOR_PREEMPTED | none | none | LOST | |
| REASON_EXECUTOR_TERMINATED | “Executor terminating/terminated” | same | LOST | |
| REASON_EXECUTOR_UNREGISTERED | “Unregistered executor” | same | KILLED | _Very confusing_ |
| REASON_FRAMEWORK_REMOVED | "Framework <id> removed" | same | KILLED| |
| REASON_GC_ERROR | "Could not launch the task because we failed to unschedule directories scheduled for gc" | same | LOST | _Potentially confusing_ |
| REASON_INVALID_FRAMEWORKID | unused | | | |
| REASON_INVALID_OFFERS | "Task launched with invalid offers: <details>" | same | LOST | |
| REASON_MASTER_DISCONNECTED | "Master disconnected" | same | LOST | |
| REASON_MEMORY_LIMIT | none | "Task used more memory than requested" | FAILED | |
| REASON_RECONCILIATION |  "Reconciliation: <Latest task state \|Task is unknown to the slave \| Task is unknown>" | same | LOST | |
| REASON_RESOURCES_UNKNOWN | "The checkpointed resources being used by the task are unknown to the slave" | same | LOST | _Potentially confusing_ |
| REASON_SLAVE_DISCONNECTED | "Slave <hostname> disconnected" | same | LOST | |
| REASON_SLAVE_REMOVED | "Slave <hostname> removed: <reason>" | same | LOST | |
| REASON_SLAVE_RESTARTED | "Task launched during slave restart" | same | LOST | |
| REASON_SLAVE_UNKNOWN | unused | | | |
| REASON_TASK_INVALID | <reason> | same | LOST | |
| REASON_TASK_UNAUTHORIZED | "Authorization failure: <failure> Not authorized to launch as user <user>" | same | LOST | |
| REASON_TASK_UNKNOWN | "Task is unknown to the slave" | same | LOST | |

The majority of status update messages are actually quite meaningful and some contain very helpful debugging info that would be impossible to substitute on the Aurora side. I am under opinion now that we should not forcefully alter/suppress all messages. Instead, we should only address a few that a) have high frequency of appearing and b) are potentially confusing. Out of those commented above, only the "Unregistered executor" clears both criteria. I propose to only suppress that one in the scope of this ticket.

> Improve UI task status reporting experience
> -------------------------------------------
>
>                 Key: AURORA-1193
>                 URL: https://issues.apache.org/jira/browse/AURORA-1193
>             Project: Aurora
>          Issue Type: Story
>            Reporter: Maxim Khutornenko
>            Priority: Minor
>
> Mesos may append an optional message with task status update that is currently surfacing in the UI via TaskEvent. These messages may not be user friendly and add confusion. One example is "Unregistered executor" issued when Mesos kills an assigned task that did not have a chance to run yet. While this message does not constitute a failure it may create an illusion of abnormal behavior in an otherwise normal operation.
> Consider filtering/formatting messages in the UI/scheduler to avoid adverse user experience. The ideal solution should also leverage TaskStatus.reason field to show additional status details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)